The main objective here is to have the needed data for the Rubin Observatory Algorithms Workshop.
The catch-all ticket is . Output repos will be inside /datasets/hsc/repo/rerun/DM-23243/
To access all DRP data products, use the following paths on lsst-* machines to instantiate your Gen2 Butler instance:
(DEEP+UDEEP) /datasets/hsc/repo/rerun/DM-23243/OBJECT/DEEP/
(WIDE) /datasets/hsc/repo/rerun/DM-23243/OBJECT/WIDE/
Exceptions for QA outputs: pipe_analysis outputs are in /datasets/hsc/repo/rerun/DM-23243/ANALYSIS/DEEP/ and /datasets/hsc/repo/rerun/DM-23243/ANALYSIS/WIDE/ validate_drp outputs are in /datasets/hsc/repo/rerun/DM-23243/validateDrp/
Job logs are at /datasets/hsc/repo/rerun/DM-23243/logs/
See individual tickets linked in the Job Summary Table for details running each pipeline
Input dataset: HSC PDR2
1. What data products do we need for the Algorithms Workshop?
Number of visits read from /datasets/hsc/repo/registry.sqlite3 (These are processed by singleFrameDriver)
field\filter | HSC-G | HSC-I | HSC-I2 | HSC-R | HSC-R2 | HSC-Y | HSC-Z | NB0387 | NB0816 | NB0921 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
SSP_UDEEP_SXDS | 18 | 31 | 5 | 18 | 46 | 53 | 30 | 32 | 233 | ||
SSP_UDEEP_COSMOS | 56 | 36 | 104 | 25 | 43 | 212 | 226 | 25 | 50 | 777 | |
SSP_DEEP_XMM_LSS/ SSP_DEEP_XMMS_LSS | 35 | 18 | 27 | 30 | 52 | 20 | 22 | 204 | |||
SSP_DEEP_ELAIS_N1 | 76 | 28 | 44 | 43 | 24 | 99 | 142 | 37 | 38 | 531 | |
SSP_DEEP_DEEP2_3 | 48 | 32 | 6 | 47 | 75 | 108 | 28 | 40 | 33 | 417 | |
SSP_DEEP_COSMOS | 103 | 40 | 75 | 32 | 74 | 111 | 168 | 26 | 51 | 680 | |
SSP_WIDE | 2519 | 916 | 1863 | 1363 | 1356 | 3207 | 3216 | 14440 | |||
SSP_AEGIS | 8 | 7 | 5 | 7 | 7 | 34 | |||||
SSP Total | 2863 | 1108 | 2097 | 1560 | 1497 | 3787 | 3972 | 74 | 154 | 204 | 17316 |
(UH) COSMOS | 21 | 90 | 67 | 21 | (Ignore 7) | ||||||
Total | 17151 |
Number of visits that are used in coaddition: 2792 visits for DEEP+UDEEP; 11821 visits for WIDE. (Only used those from NAOJ's tract-visits list)
Tract list copied from the HSC release page, the table of "database records":
UDEEP+DEEP | Filters | Tracts |
---|---|---|
COSMOS | g,r,i,z,y,NB0387,NB0816,NB0921 | 9569-9572, 9812-9814, 10054-10056 |
DEEP2-3 | g,r,i,z,y,NB0387,NB0816,NB0921 | 9219-9221, 9462-9465, 9706-9708 |
ELAIS-N1 | g,r,i,z,y,NB0816,NB0921 | 16984-16985, 17129-17131, 17270-17272, 17406-17407 |
SXDS+XMM-LSS | g,r,i,z,y,NB0387,NB0816,NB0921 | 8282-8284, 8523-8525, 8765-8767 |
In total 39 tracts UDEEP+DEEP.
WIDE | Filters | Tracts |
---|---|---|
W01 (WIDE01H) | g,r,i,z,y | 8994-8999, 9236-9242, 9479-9485, 9722-9728, 9964-9969 |
W02 (XMM) | g,r,i,z,y | 8278-8286, 8519-8527, 8761-8769, 9003-9011, 9245-9253, 9488-9496, 9731-9739, 9973-9981, 10215-10223 |
W03 (GAMA09H) | g,r,i,z,y | 9069-9092, 9312-9335, 9555-9578, 9797-9820, 10039-10051, 10053-10057, 10282-10293, 10296-10298 |
W04 (WIDE12H+GAMA15H) | g,r,i,z,y | 9096-9136, 9338-9379, 9581-9622, 9824-9864, 10079-10084, 10101-10106, 10321-10326, 10343-10348 |
W05 (VVDS) | g,r,i,z,y | 8984-8986, 9206-9233, 9448-9476, 9691-9719, 9933-9960, 10175-10195, 10417-10436, 10659-10677, 10899-10904, 10912-10917 |
W06 (HECTOMAP) | g,r,i,z,y | 15808-15834, 15987-16012, 16162-16186 |
W07 (AEGIS) | g,r,i,z,y | 16821-16822, 16972-16973 |
The tract IDs for which we have data products in the WIDE layer: tract_id_wide.txt
2. Stack versions, pipeline steps and configs:
To get this running asap, we are comfortable to use different versions for different steps this time.
These use the /software/lsstsw/stack_20191101 shared stack.
config.makeCoaddTempExp.externalPhotoCalibName='fgcm' config.assembleCoadd.externalPhotoCalibName='fgcm' config.assembleCoadd.assembleStaticSkyModel.externalPhotoCalibName='fgcm'
The following use the new shared stack at /software/lsstsw/stack_20200220
Pipeline commands: https://github.com/lsst-dm/s20-hsc-pdr2-reprocessing
Discussions:
want jointcal for astrometry & fgcm for photometry.
jointcal udeep takes days. Each filter can be on separate nodes. ~3 nodes 5days for the deepest tract. Give it 14+ days of walltime for udeep.
3. Infrastructure: compute & disk space – Michelle B is aware and has it under control.
4. Human resources from NCSA?
5. Waiting for:
6. Job status and summary
DEEP & UDEEP | WIDE | Total node-hours | |
---|---|---|---|
singleFrameDriver |
|
| 2758.08 |
skymap |
|
| 0.02 |
jointcal |
| 3466.34 | |
fgcmcal | 83.45 | ||
skyCorrection |
|
| 369.50 |
coadd |
|
| 3735.56 |
multiband |
| 20792.75 | |
post-processing |
|
| 152.68 |
forcedPhotCcd (low priority) |
| 3050.69 | |
matchedVisitMetrics (validate_drp) | 1233.11 | ||
visitAnalysis | 4298.16 | ||
CompareVisitAnalysis (low priority) |
| 724.71 | |
colorAnalysis |
| 26.06 | |
coaddAnalysis | 2231.57 | ||
matchVisits (qa_explorer) |
| 15.42 |
7. Reproducible Pipelines Failures - singleFrameDriver
DEEP+UDEEP:
301 CCDs failed in UDEEP and their data IDs are in fatals_id_udeep.txt 1730 CCDs failed in DEEP and their data IDs are in fatals_id_deep.txt
Among these 2031 reproducible failures:
WIDE:
1390 CCDs failed in WIDE. Their Ids are in fatals_id_wide.txt
8. Reproducible Pipelines Errors - Jointcal
Seeing some ERROR: Potentially bad fit: High chi-squared/ndof. Data IDs are attached in DM-23323 and DM-23395.
(Maybe only in tract with few visits??)
9. Reproducible Pipelines Failures - skyCorrection
visit=137268 and 137288 failed with error "No good pixels in image array"; only 1 and 2 calexps exist for these visits; DM-23551 is filed;
Both visits are 30s exposures in NB0387 from 2018-01-14; for continuing the reprocessing campaign, they are not needed in the coadd.
10. FGCM
fgcm_photoCalib products were not written for some visits. See DM-23394 and DM-23698
In total 138 visits miss some fgcm_photoCalib products. Some visits miss fgcm_photoCalib for all CCDs and some for selected CCDs.
The data IDs missing fgcm_photoCalib are
(DEEP+UDEEP) https://jira.lsstcorp.org/secure/attachment/42853/42853_fgcmNoPhoto_deep.txt
(WIDE) https://jira.lsstcorp.org/secure/attachment/42854/42854_fgcmNoPhoto_wide.txt
The missing fgcm_photoCalib means no downstream data for those visits/ccds.
11. Reproducible Pipelines Errors - coadd
Among many warnings some also mentioned errors:
See DM-23602.
12. Reproducible Pipelines Failures - matchedVisitMetrics (validate_drp)
If a tract+filter only has one visit, the task can't work: DM-23581 So we don't run those cases.
For WIDE, 15 failed with "FATAL: Failed: `ydata` must not be empty".
For DEEP,
See DM-23654.
Also note that the output are not proper Butler rerun repos; the task isn't writing outputs using Butler.
13. Reproducible Pipelines Errors - coaddAnalysis (pipe_analysis)
14. Reproducible Pipelines Failures - colorAnalysis (pipe_analysis)
15. Reproducible Pipelines Failures - forcedPhotCcd
See DM-23867 for the data IDs.
Compute Time
See the Job Summary Table for the breakdowns.
The total compute for HSC-PDR2 is 31205.7 node-hours up to multiband processing (that is, no forcedPhotCcd, no post-processing, no QA pipelines of any kind, -- same as in the S18 PDR1 run for a fair comparison). Before the execution it was estimated by a simple scaling of multiplying PDR1 by 3 times = 9227.15*3 = 27681.45; that was only ~13% more.
All non-QA pipelines sum to 34409.07 node-hours.
All jobs sum to 42938.10 node-hours.