The main objective here is to have the needed data for the Rubin Observatory Algorithms Workshop

The catch-all ticket is .  Output repos will be inside /datasets/hsc/repo/rerun/DM-23243/ 

To access all DRP data products, use the following paths on lsst-* machines to instantiate your Gen2 Butler instance:    

(DEEP+UDEEP)    /datasets/hsc/repo/rerun/DM-23243/OBJECT/DEEP/

(WIDE)    /datasets/hsc/repo/rerun/DM-23243/OBJECT/WIDE/

Exceptions for QA outputs:  pipe_analysis outputs are in /datasets/hsc/repo/rerun/DM-23243/ANALYSIS/DEEP/ and /datasets/hsc/repo/rerun/DM-23243/ANALYSIS/WIDE/    validate_drp outputs are in /datasets/hsc/repo/rerun/DM-23243/validateDrp/

Job logs are at  /datasets/hsc/repo/rerun/DM-23243/logs/  

See individual tickets linked in the Job Summary Table for details running each pipeline



Input dataset: HSC PDR2 

1. What data products do we need for the Algorithms Workshop? 

Number of visits read from /datasets/hsc/repo/registry.sqlite3   (These are processed by singleFrameDriver)

field\filterHSC-GHSC-IHSC-I2HSC-RHSC-R2HSC-YHSC-ZNB0387NB0816NB0921Total
SSP_UDEEP_SXDS1831518
4653
3032

233

SSP_UDEEP_COSMOS56361042543212226
2550

777

SSP_DEEP_XMM_LSS/
SSP_DEEP_XMMS_LSS
3518
27
30522022

204

SSP_DEEP_ELAIS_N1762844432499142
3738

531

SSP_DEEP_DEEP2_34832647
75108284033

417

SSP_DEEP_COSMOS1034075327411116826
51

680

SSP_WIDE251991618631363135632073216


14440

SSP_AEGIS87
5
77


34
SSP Total

2863

1108

2097

1560

1497

3787

3972

74

154

204

17316
(UH) COSMOS 2190


6721

7206 199
(Ignore 7)
Total









17151


Number of visits that are used in coaddition: 2792 visits for DEEP+UDEEP; 11821 visits for WIDE.  (Only used those from NAOJ's tract-visits list)

Tract list copied from the HSC release page, the table of "database records":

UDEEP+DEEPFiltersTracts
COSMOSg,r,i,z,y,NB0387,NB0816,NB09219569-9572, 9812-9814, 10054-10056
DEEP2-3g,r,i,z,y,NB0387,NB0816,NB09219219-9221, 9462-9465, 9706-9708
ELAIS-N1g,r,i,z,y,NB0816,NB092116984-16985, 17129-17131, 17270-17272, 17406-17407
SXDS+XMM-LSSg,r,i,z,y,NB0387,NB0816,NB09218282-8284, 8523-8525, 8765-8767

In total 39 tracts UDEEP+DEEP.

WIDEFiltersTracts
W01 (WIDE01H)g,r,i,z,y8994-8999, 9236-9242, 9479-9485, 9722-9728, 9964-9969
W02 (XMM)g,r,i,z,y8278-8286, 8519-8527, 8761-8769, 9003-9011, 9245-9253, 9488-9496, 9731-9739, 9973-9981, 10215-10223
W03 (GAMA09H)g,r,i,z,y9069-9092, 9312-9335, 9555-9578, 9797-9820, 10039-10051, 10053-10057, 10282-10293, 10296-10298
W04 (WIDE12H+GAMA15H)g,r,i,z,y9096-9136, 9338-9379, 9581-9622, 9824-9864, 10079-10084, 10101-10106, 10321-10326, 10343-10348
W05 (VVDS)g,r,i,z,y8984-8986, 9206-9233, 9448-9476, 9691-9719, 9933-9960, 10175-10195, 10417-10436, 10659-10677, 10899-10904, 10912-10917
W06 (HECTOMAP)g,r,i,z,y15808-15834, 15987-16012, 16162-16186
W07 (AEGIS)g,r,i,z,y16821-16822, 16972-16973

The tract IDs for which we have data products in the WIDE layer: tract_id_wide.txt


2. Stack versions, pipeline steps and configs:

To get this running asap, we are comfortable to use different versions for different steps this time.

These use the /software/lsstsw/stack_20191101 shared stack.

The following use the new shared stack at /software/lsstsw/stack_20200220

Pipeline commands:  https://github.com/lsst-dm/s20-hsc-pdr2-reprocessing

Discussions:


3. Infrastructure: compute & disk space – Michelle B is aware and has it under control. 

4. Human resources from NCSA? 


5. Waiting for: 


6. Job status and summary  


DEEP & UDEEPWIDETotal node-hours
singleFrameDriver

2758.08

skymap
  •  slurm job ID: 229995
  •  slurm job ID: 229996
0.02
jointcal

3466.34

fgcmcal83.45
skyCorrection

369.50

coadd

3735.56

multiband

20792.75

post-processing

152.68

forcedPhotCcd (low priority)
  • DM-23867  slum job IDs: 246937,247134,247135,247264,247740,247762,247852

3050.69

matchedVisitMetrics (validate_drp)

1233.11

the new validateDrp.py?
visitAnalysis

4298.16

CompareVisitAnalysis (low priority)724.71
colorAnalysis
  • DM-23866 slurm job IDs: 47430,247453,247456

26.06

coaddAnalysis

2231.57

matchVisits (qa_explorer)
  • DM-23831   slurm job IDs: 245159,246742,246820

15.42


7. Reproducible Pipelines Failures - singleFrameDriver

DEEP+UDEEP: 

301 CCDs failed in UDEEP and their data IDs are in fatals_id_udeep.txt    1730 CCDs failed in DEEP and their data IDs are in fatals_id_deep.txt
Among these 2031 reproducible failures:

WIDE: 

1390 CCDs failed in WIDE. Their Ids are in fatals_id_wide.txt


8. Reproducible Pipelines Errors - Jointcal 

Seeing some   ERROR: Potentially bad fit: High chi-squared/ndof.  Data IDs are attached in DM-23323 and DM-23395.

(Maybe only in tract with few visits??) 


9. Reproducible Pipelines Failures - skyCorrection 

visit=137268 and 137288 failed with error "No good pixels in image array"; only 1 and 2 calexps exist for these visits; DM-23551  is filed; 

Both visits are 30s exposures in NB0387 from 2018-01-14; for continuing the reprocessing campaign, they are not needed in the coadd. 


10. FGCM 

fgcm_photoCalib products were not written for some visits. See DM-23394 and DM-23698

In total 138 visits miss some fgcm_photoCalib products. Some visits miss fgcm_photoCalib for all CCDs and some for selected CCDs. 

The data IDs missing fgcm_photoCalib are

(DEEP+UDEEP) https://jira.lsstcorp.org/secure/attachment/42853/42853_fgcmNoPhoto_deep.txt 

(WIDE) https://jira.lsstcorp.org/secure/attachment/42854/42854_fgcmNoPhoto_wide.txt

The missing fgcm_photoCalib means no downstream data for those visits/ccds.


11. Reproducible Pipelines Errors -  coadd

Among many warnings some also mentioned errors:

See  DM-23602


12. Reproducible Pipelines Failures - matchedVisitMetrics (validate_drp) 

If a tract+filter only has one visit, the task can't work:  DM-23581   So we don't run those cases. 


For WIDE, 15 failed with "FATAL: Failed: `ydata` must not be empty".

For DEEP, 

See DM-23654.

Also note that the output are not proper Butler rerun repos; the task isn't writing outputs using Butler. 


13. Reproducible Pipelines Errors - coaddAnalysis (pipe_analysis) 


14. Reproducible Pipelines Failures - colorAnalysis (pipe_analysis)

15. Reproducible Pipelines Failures - forcedPhotCcd


Compute Time

See the Job Summary Table for the breakdowns. 

The total compute for HSC-PDR2 is 31205.7 node-hours up to multiband processing (that is, no forcedPhotCcd, no post-processing, no QA pipelines of any kind, -- same as in the S18 PDR1 run for a fair comparison). Before the execution it was estimated by a simple scaling of multiplying PDR1 by 3 times = 9227.15*3 = 27681.45; that was only ~13% more.  

All non-QA pipelines sum to 34409.07 node-hours. 

All jobs sum to 42938.10 node-hours.