RC3 dataset wish list, considerations

See associated ticket: DM-26276 - Getting issue details... STATUS

In preparation for expanding the RC2 dataset we use for testing and monitoring, this page is a place to gather suggestions of desired features that are lacking in RC2. We will then assess what can be added to RC2 to create a new dataset (RC3) for regular reprocessing and monitoring.

Desired features in RC3

Features that would improve upon RC2 to enhance our abilities to characterize DRP processing performance.

feature request	use case	data needed, notes
larger contiguous area	Useful for testing spatially-dependent metrics over large areas. Also helpful for FGCM calibration?	Augment the area present in RC2 with adjacent regions. Preferably do this for all three tracts, so that the three contiguous regions are all larger, but even expanding one region would be helpful. Consideration: is it better to expand one region by a large amount (say, from 1 tract to 4 contiguous tracts), or expand all three RC2 tracts to 2-3 tract regions each? YA: Variety for testing robustness is more important to me than contiguous area. Arun Kannawadi correct me if I'm wrong: The argument for contiguous area is to be able to compute correlation functions for larger spatial scales. Arun says that if the rho statistics are already small enough at those scales (smaller scales), it’d be a safe assumption to state that it is not going to get any worse at larger angular scales. I'd rather have 3 tracts that test 3 different scenarios (e.g. low galactic latitude, high galactic latitude, mix of R/R2 vs just R2 vs just R, shallow, deep) than 3 similar tracts right next to each other. Our CI system parallelizes by tract (and treats tracts as independent), . We couldn't automatically compute a per-tract metric of angular scales larger than a tract now anyway. Tracts are roughly the size of a focal plane. 1.5 sq deg for HSC. Planned 10 sq. deg for Rubin. The rho stats at tract-scale will alert us if some thing looks weird that we need to look at at larger scale, and larger datasets should be used for those investigations.
multiple "physical" filters that map to the same "effective/abstract" filter	In 2016, HSC replaced its HSC-I and HSC-R filters with new versions (with better-behaved filter response functions), referred to as HSC-I2 and HSC-R2, respectively. The PDR2 dataset includes observations in both old and new filters, which are coadded together in processing. This raises the question of any potential undesirable effects on the photometry as a result of the combination. This is especially important given that the fields in the sample will have differing "Filter Fraction"s.	PDR2 included data from HSC-I/HSC-I2 and HSC-R/HSC-R2. An obvious field to include the multiple filters is COSMOS tract 9813, which has near equal contributions from both filters (https://hsc-release.mtk.nao.ac.jp/doc/index.php/filterfrac-2/#DUD-COSMOS). Considerations: Is just including the full PDR2 dataset (i.e. a single set of coadds having a mixture of the HSC-I/HSC-I2 & HSC-R/HSC-R2 filters) enough? Should we have a full suite of three independent COSMOS 9813 coadds with all permutations possible (and should they all reach roughly the same effective depth): pure HSC-I & HSC-R, pure HSC-I2 & HSC-R2, and HSC-I/HSC-I2 & HSC-R/HSC-R2 mixtures Are all three worthwhile given the amount of compute (and potential bookkeeping/workflow headaches) it will use up? Is COSMOS a reasonable choice given the small dithering pattern? From Eli Rykoff : Oh my that would be a workflow issue. But would test our stuff! I think we should definitely continue with cosmos (using both as the baseline). And then a tract with r2/i2 and a tract with r/i at a minimum. YA: We can avoid the workflow headache by choosing 3 different wide tracts: (one with pure I/R, one with pure I2/R2 and one with a mix) instead of one tract with different permutations.
multiple coadd of the same tract but for increasing number of input visits/depths	The efficacy of many of our pipelines algorithms are sensitive to the number of input visits (e.g. artifact identification and rejection) or effective image depth (e.g. deblending & thus model fitting and shape measurements). There may even be a danger for diminishing returns with added depth for the latter.	What would be an optimal tract to do this? How many/which filters would this ideally be done on? Can we come up with suitable metrics to contrast and compare (would some level of fake-insertion be required)? As above, would this be worthwhile weighted against the added workflow hassle?
Template for AP differencing imaging of COSMOS	AP is now performing image differencing on the HSC COSMOS field (tract 9813); it would be valuable to exercise the DRP-produced template → AP image differencing flow that we will use during operations.	The visits currently used by AP to construct COSMOS templates are listed in DM-22431 - Getting issue details... STATUS and DM-24251 - Getting issue details... STATUS : HSC-G 59150 59152 59154 59156 59158 59160 90750 90752 90754 95148 95158 95168 95178 97582 101418 101438 101450 101470 101500 103054 103076 104856 104866 106058 106068 106078 106130 107872 107882 107892 107902 113174 113176 113178 113180 HSC-R2 91554^91556^91558^91560^91560^91564^96850^96888^96936 HSC-I2 90770^90772^90774^90776^90778^90780^90782^90784 HSC-Z 17928^17908^17906^17926^17904^17902^17930^17962^17946^17944 HSC-Y 22602^22604^22644^22632^22628^22642^22626^22630^22664^11730^22662^22606^11736^22646^22660^11718^11732
artificial sources inserted into the data	Testing measurement algorithms, recovery of "true" (input) properties, biases in measurements, etc.	This should include fake galaxies and stars at minimum. The galaxies should include a variety of morphologies and surface brightnesses. Should we also include variable sources?
region with long enough time baseline to test proper motions and parallax	Testing astrometry in presence of non-negligible proper motions	Long time baseline between observations of the field. It may be that there are other datasets besides HSC-SSP that are better suited for this test. This is something that may be difficult to determine directly from commissioning data (given expected schedule), so it would be useful to have done testing on precursor datasets
include ~all COSMOS visits	"full-depth" testing of pipelines	Can we reprocess all the COSMOS visits regularly?
include data with rotational dithers	Rotational dithering is strongly requested – by the lensing groups, in particular – for the Rubin/LSST survey to help mitigate shape biases tied to the focal plane (e.g. brighter-fatter residuals).	There is public University of Hawaii (UH) data in COSMOS (in the HSC-G HSC-I, HSC-Z, and HSC-Y filters – no R band, unfortunately; see https://ui.adsabs.harvard.edu/abs/2017arXiv170600566T/abstract) that is already ingested at NCSA: `$ sqlite3 /datasets/hsc/repo/registry.sqlite3` `sqlite> select filter, pa, COUNT(*) from raw WHERE ccd=49 AND field="COSMOS" AND dataType="OBJECT" GROUP BY pa, filter ORDER BY filter;` `HSC-G\|90.0\|21` `HSC-I\|-180.0\|10` `HSC-I\|-90.0\|10` `HSC-I\|0.0\|9` `HSC-I\|90.0\|30` `HSC-Y\|-180.0\|20` `HSC-Y\|-90.0\|15` `HSC-Y\|0.0\|15` `HSC-Y\|90.0\|17` `HSC-Z\|90.0\|21`

Comments from HSC-ers (Masayuki Tanaka, in particular):

You may want to include a range of seeing for the PSF test.
The following tests probably go too far for RC3 (they are in a sense for science and may not be worth running every 2 weeks), but tests that HSC has done include the following:
• We generate three COSMOS stacks at the Wide depth with bad/median/good seeing sizes. This allows you to characterize how things change with seeing. Might be useful when you make a major change in, e.g., the deblender.
• Two COSMOS stacks with similar seeing and depth but using completely different sets of visits. The pixels in these two stacks are independent and will be useful to characterize how good your uncertainty estimates are.
We need sufficient number of non-PSF stars to characterize PSF modelling errors as a function of magnitude. Would low latitude or some region with high star density help in this aspect?
Tracts covered with 5 dithers and tracts 6 dithers should be included.

Are there specific dithers (larger than COSMOS) that would help test astrometry?

(Ask J. Parejko or Clare?) Also get rotational dithers for testing the astrometric fitter if possible.

Computing/processing/storage considerations:

In addition to including features useful for data quality monitoring, the compute resources and increased time that a larger dataset will take to process are important concerns. We collect thoughts on these issues in this section.

YA: While we are still on Gen2 keep to 1 rerun. i.e. Test deep, medium, shallow conditions by processing deep, medium, shallow tracts rather than 1 tract in 3 reruns that correspond to 3 different visit lists.

Primary limitation is processing time.

Cpu-hours: approximately linear with number of visits. Given that it takes a week to run now, I don't see how we could process more than 2x the number of visits comfortably. We make decisions based on the metrics. I value processing turnaround more than the number of visits. I would like to keep the processing time < 2 weeks please. Remember that we plan on adding processing steps: Image differencing, forcedPhotDiffim.
Human hours: Workflow simplicity is the priority here.

Proposal:

Because the COSMOS field lies within a larger WIDE region of the HSC-SSP, we propose to include all COSMOS data in RC3, plus additional contiguous tracts from the WIDE footprint that create a contiguous field extending to the "edge" of the survey footprint. This enables all of the following:

Full survey depth coadds in the COSMOS field
COSMOS "truth" table of deep HST galaxy measurements for comparison
COSMOS provides a long time baseline over which to validate parallax/proper motion algorithms (though the lack of dithering may be an issue?)
COSMOS has data from both HSC-I/HSC-I2 and also HSC-R/HSC-R2. We can thus test processing on, e.g., only HSC-I, only HSC-I2, or the combination of them both.
The large number of visits in COSMOS means we can create independent coadds consisting of separate sets of visits.
Extending over a large area provides a dataset to use in developing QA tools
Extends to the edge of the survey footprint
Can use WIDE data when proper dithering is required, but COSMOS data when depth is more important

Caveats/questions:

Little variation in declination or Galactic latitude. May need some Subaru+HSC PI data to get higher source densities.
We could consider cherry-picking some region(s) of the sky with, e.g., a known rich galaxy cluster (e.g. RC2's 9615 was selected for this reason + a big galaxy; see 3-color image attached to DM-11345 - Getting issue details... STATUS ), or some Galactic cirrus, or other features if we want to exercise/test specific algorithms and capabilities.

The following shows how the WIDE an DEEP/UDEEP visits are distributed in this region. The filter fractions are 100% i/r for the WIDE field in the COSMOS region, but the UDEEP is of order 50/50 (so, question for Eli is whether dithering/areal coverage limitations will hamper the i/i2 r/r2 calibration assessments).

Space shortcuts

Page tree

Desired features in RC3

Comments from HSC-ers (Masayuki Tanaka, in particular):

Are there specific dithers (larger than COSMOS) that would help test astrometry?

Computing/processing/storage considerations:

Proposal:

3 Comments

Jeffrey Carlin

Yusra AlSayyad

Lee Kelvin