This page collects notes about the HSC reprocessing in cycle S17B, including processing the RC dataset and processing the full PDR1 dataset. Descriptions about the PDR1 dataset and the RC dataset are summarized, along with the software stack and pipelines used in the processing.
A stack based on w_2017_17 was used. The output repositories are available to DM team members at:
/datasets/hsc/repo/rerun/DM-10404/UDEEP/
/datasets/hsc/repo/rerun/DM-10404/DEEP/
/datasets/hsc/repo/rerun/DM-10404/WIDE/
A description of the aims and organization of this project is available here.
The PDR1 dataset has been transferred to the LSST GPFS storage /datasets by and the butler repo is available at /datasets/hsc/repo
It includes 5654 visits in 7 bands: HSC-G, HSC-R, HSC-I, HSC-Y, HSC-Z, NB0816, NB0921. Their visit IDs are visitId-SSPPDR1.txt. The official release site is at https://hsc-release.mtk.nao.ac.jp/
The survey has three layers and includes 8 fields.
HSC-G | HSC-R | HSC-I | HSC-Z | HSC-Y | NB0921 | NB0816 | |||
---|---|---|---|---|---|---|---|---|---|
Layer | Field Name ("OBJECT") | Number of visits | Tract IDs (from https://hsc-release.mtk.nao.ac.jp/doc/index.php/database/ ) | ||||||
DEEP | SSP_DEEP_ELAIS_N1 | 32 | 24 | 28 | 51 | 24 | 20 | 0 | 16984, 16985, 17129, 17130, 17131, 17270, 17271, 17272, 17406, 17407 |
SSP_DEEP_DEEP2_3 | 32 | 31 | 32 | 44 | 32 | 23 | 17 | 9220, 9221, 9462, 9463, 9464, 9465, 9706, 9707, 9708 | |
SSP_DEEP_XMM_LSS, SSP_DEEP_XMMS_LSS | 25 | 27 | 18 | 21 | 25 | 0 | 0 | 8282, 8283, 8284, 8523, 8524, 8525, 8765, 8766, 8767 | |
SSP_DEEP_COSMOS | 20 | 20 | 40 | 48 | 16 | 18 | 0 | 9569, 9570, 9571, | |
UDEEP | SSP_UDEEP_SXDS | 18 | 18 | 31 | 43 | 46 | 21 | 19 | 8523, 8524, 8765, 8766 |
SSP_UDEEP_COSMOS | 19 | 19 | 35 | 33 | 55 | 29 | 0 | 9570, 9571, 9812, 9813, 9814, 10054, 10055 | |
WIDE | SSP_AEGIS | 8 | 5 | 7 | 7 | 7 | 0 | 0 | 16821,16822, 16972, 16973 |
SSP_WIDE | 913 | 818 | 916 | 991 | 928 | 0 | 0 | XMM: 8279-8285, 8520-8526, 8762-8768 GAMA09H: 9314-9318, 9557-9562, 9800-9805 WIDE12H: 9346-9349, 9589-9592 GAMA15H: 9370-9375, 9613-9618 HECTOMAP: 15830-15833, 16008-16011 VVDS: 9450-9456, 9693-9699, 9935-9941 |
Plots of tracts/patches: https://hsc-release.mtk.nao.ac.jp/doc/index.php/data/
Note: tract 9572 is listed on HSC PDR1 website for DEEP_COSMOS but no data actually overlap it; PDR1 does not have it either.
Note: In S17B, more tracts than listed were processed. See below.
The RC dataset was originally defined in https://hsc-jira.astro.princeton.edu/jira/browse/HSC-1361 for hscPipe 3.9.0.
The RC dataset is public and available at /datasets/. 62 visits of them were not included in PDR1 (): two of SSP_WIDE and 60 of SSP_UDEEP_COSMOS; their visit IDs are 274 276 278 280 282 284 286 288 290 292 294 296 298 300 302 306 308 310 312 314 316 320 334 342 364 366 368 370 1236 1858 1860 1862 1878 9864 9890 11742 28354 28356 28358 28360 28362 28364 28366 28368 28370 28372 28374 28376 28378 28380 28382 28384 28386 28388 28390 28392 28394 28396 28398 28400 28402 29352 (also see here).
The RC dataset includes (a) 237 visits of SSP_UDEEP_COSMOS and (b) 83 visits of SSP_WIDE, in 6 bands:
The LSST software stack is used; its Getting Started documentation is at https://pipelines.lsst.io
Stack version: w_2017_17 (published on 26-Apr-2017) + master meas_mosaic/obs_subaru/ctrl_pool of 7-May-2017 built with w_2017_17 (i.e. w_2017_17 + DM-10315 + DM-10449 + DM-10430).
That implies the PS1 reference catalog "ps1_pv3_3pi_20170110" in the LSST format (HTM indexed) is used (/datasets/refcats/htm/ps1_pv3_3pi_20170110/).
The calibration dataset is the 20170105 version from Paul Price; the calibration repo is located at /datasets/hsc/calib/20170105 from .
The externally provided bright object masks (butler type "brightObjectMask") of version "Arcturus" () are added to the repo and applied in coaddDriver.assembleCoadd.
Pipeline steps and configs:
forcedPhotCcd.py Note: it was added late and hence was not run in the RC processing
Operational configurations, such as logging configurations in ctrl_pool, different from the tagged stack may be used (e.g. DM-10430).
In the full PDR1 reprocessing, everything was run with the same stack version and config. Reproducible failures are noted below, but no reprocessing is done with a newer software version.
This stack version had a known science problem about bad ellipticity residuals as reported in ; the bug fix was merged to the stack on May 30 and hence was not applied in this reprocessing campaign.
coaddDriver:
multiband: typically want to use one core per patch; so the upper limit of usefulness is the number of patches multiplied by the number of filters
These pipelines will be run no smaller than these units:
forcedPhotCcd.py ccd
Data of different layers (DEEP/UDEEP/WIDE) are processed separately.
Example commands used in RC processing: exampleRcProcessing.sh A pipe_drivers documentation from 2016 is at https://dmtn-023.lsst.io
The output data products of each step, their butler dataset types and butler policy templates are summarized at S17B Output dataset types of pipe_drivers tasks for HSC for the w_2017_17 stack.
JIRA ticket: It used the w_2017_17 stack and meas_mosaic ecfbc9d built with w_2017_17
singleFrameDriver: Reproducible failures in 46 ccds from 23 visits. The failed visit/ccds are the same as those in the w_2017_14 stack (). Their data IDs are:
--id visit=278 ccd=95 --id visit=280 ccd=22^69 --id visit=284 ccd=61 --id visit=1206 ccd=77 --id visit=6478 ccd=99 --id visit=6528 ccd=24^67 --id visit=7344 ccd=67 --id visit=9736 ccd=67 --id visit=9868 ccd=76 --id visit=17738 ccd=69 --id visit=17750 ccd=58 --id visit=19468 ccd=69 --id visit=24308 ccd=29 --id visit=28376 ccd=69 --id visit=28380 ccd=0 --id visit=28382 ccd=101 --id visit=28392 ccd=102 --id visit=28394 ccd=93 --id visit=28396 ccd=102 --id visit=28398 ccd=95^101 --id visit=28400 ccd=5^10^15^23^26^40^53^55^61^68^77^84^89^92^93^94^95^99^100^101^102 --id visit=29324 ccd=99 --id visit=29326 ccd=47
WIDE: The coadd products have all 81 patches in both tracts (8766, 8767) in 5 filters, except that there is no coadd in tract 8767 patch 1,8 in HSC-R (nothing passed the PSF quality selection there); the multiband products of all 162 patches are generated.
COSMOS: The coadd products have 77 patches in tract 9813 in HSC-G, 74 in HSC-R, 79 in HSC-I, 79 in HSC-Y, 79 in HSC-Z, and 76 in NB0921; the multiband products of 79 patches are generated.
"brightObjectMask" were not applied; but they should not affect.
forcedPhotCcd.py was not run in the RC processing.
JIRA ticket:
All processing were done with the same stack setup (i.e. without DM-10451). Data of the three layers (UDEEP, DEEP, WIDE) were processed separately.
The output repositories are at:
/datasets/hsc/repo/rerun/DM-10404/UDEEP/
/datasets/hsc/repo/rerun/DM-10404/DEEP/
/datasets/hsc/repo/rerun/DM-10404/WIDE/
All logs are at /datasets/hsc/repo/rerun/DM-10404/logs/
While unnecessary, some edge tracts outside of the PDR1 coverage were attempted in the processing this time. Those data outputs are kept in the repos as well. In other words, there are more tracts in the above output repositories than listed in the tract IDs in the table on top of this page; the additional data can be ignored.
In singleFrameDriver/processCcd, there were reproducible failures in 78 CCDs from 74 visits. Their data IDs are:
--id visit=1206 ccd=77 --id visit=6342 ccd=11 --id visit=6478 ccd=99 --id visit=6528 ccd=24^67 --id visit=6542 ccd=96 --id visit=7344 ccd=67 --id visit=7356 ccd=96 --id visit=7372 ccd=29 --id visit=9736 ccd=67 --id visit=9748 ccd=96 --id visit=9838 ccd=101 --id visit=9868 ccd=76 --id visit=11414 ccd=66 --id visit=13166 ccd=20 --id visit=13178 ccd=91 --id visit=13198 ccd=84 --id visit=13288 ccd=84 --id visit=15096 ccd=47^54 --id visit=15206 ccd=100 --id visit=16064 ccd=101 --id visit=17670 ccd=24 --id visit=17672 ccd=24 --id visit=17692 ccd=8 --id visit=17736 ccd=63 --id visit=17738 ccd=69 --id visit=17750 ccd=58 --id visit=19468 ccd=69 --id visit=23680 ccd=77 --id visit=23798 ccd=76 --id visit=24308 ccd=29 --id visit=25894 ccd=68 --id visit=29324 ccd=99 --id visit=29326 ccd=47 --id visit=29936 ccd=66 --id visit=29942 ccd=96 --id visit=29966 ccd=103 --id visit=30004 ccd=95 --id visit=30704 ccd=101 --id visit=32506 ccd=8 --id visit=33862 ccd=8 --id visit=33890 ccd=61 --id visit=33934 ccd=95 --id visit=33964 ccd=101 --id visit=34332 ccd=61 --id visit=34334 ccd=61 --id visit=34412 ccd=78 --id visit=34634 ccd=61 --id visit=34636 ccd=61 --id visit=34928 ccd=61 --id visit=34930 ccd=61 --id visit=34934 ccd=101 --id visit=34936 ccd=50 --id visit=34938 ccd=95 --id visit=35852 ccd=8 --id visit=35862 ccd=61 --id visit=35916 ccd=50 --id visit=35932 ccd=95 --id visit=36640 ccd=68 --id visit=37342 ccd=78 --id visit=37538 ccd=100 --id visit=37590 ccd=85 --id visit=37988 ccd=33 --id visit=38316 ccd=11 --id visit=38328 ccd=91 --id visit=38494 ccd=6^54 --id visit=42454 ccd=24 --id visit=42510 ccd=77 --id visit=42546 ccd=93 --id visit=44060 ccd=31 --id visit=44090 ccd=27^103 --id visit=44094 ccd=101 --id visit=44162 ccd=61 --id visit=46892 ccd=64 --id visit=47004 ccd=101
Out of the 78 failures:
A rerun log of these failures is attached as singleFrameFailures.log.
In multiBandDriver, two patches of WIDE (tract=9934 patch=0,0 and tract=9938 patch=0,0) failed with AssertionError as reported in . I excluded the failed patches from the multiBandDriver commands, and then jobs were able to complete and process all other patches.
The multiBandDriver job of WIDE tract=9457 could not finish unless patch=1,8 is excluded. However tract 9457 is actually outside of the PDR1 coverage.
In forcedPhotCcd, fatal errors were seen about the reference of a patch does not exist; therefore some forced_src were not generated. A JIRA ticket has been filed:
This section includes low-level details that may only be of interest to the operation team.
The first singleFrame job started on May 8, the last multiband job was May 22, and the last forcedPhotCcd job was on Jun 1. The processing was done using the Verification Cluster and the GPFS space mounted on it. The NCSA team was responsible of shepherding the run and resolving non-pipeline issues, with close communications with and support from the DRP team regarding the science pipelines. The "ctrl_pool" style drivers were run on the slurm cluster.
The processing tasks/drivers were run as a total of 8792 slurm jobs:
The figures above show the disk usage in the production scratch space, which was reserved purely for this S17B campaign use. Tests and failed runs wrote to this space as well. At hour ~275, removal of some older data in this scratch space was performed so the drop should be ignored.
The resultant data products are archived in 4 folders at /datasets/hsc/repo/rerun/DM-10404/. In total there are 11594219 files. The large files are typically hundreds of MBs. The average size is ~14MB. The file size distribution is as the plot below:
In terms of butler dataset types, the plots below show the distributions for SFM products and others. All plots are in log scale.
More details can be found at https://jira.lsstcorp.org/browse/DM-10904
Total CPU = 79246 core-hours ~471.7 core-weeks
Total User CPU = 76246 core-hours ~453.8 core-weeks
The core-hours spent at each pipeline step are:
sfm: 19596.9
mosaic: 943.2
coadd: 5444.9
multiband: 34127.2
forcedPhotCcd: 19133.9
The figure below shows the "efficiency", calculated by dividing the total cpu time by wall elapsed time * number of cores, for each pipeline.
The Verification Cluster in its optimal state has 48 compute nodes with 24 physical cores, 256 GB RAM on each node. For the duration of the S17B reprocessing there was a peak of 45 compute nodes available. The total number of node-hours used was 9383.43. The node-hours spent for each code were as follows:
The plot below does not include failed jobs or test attempts, of which the generated data do not contribute to the final results directly.
A lightly modified version of this report has been turned into DMTR-31, part of the DM Test Reports collection on DocuShare.
Questions? For LSST-DM HSC-reprocessing effort we have a slack channel #dm-hsc-reprocessing