Standard Collections and Collection Naming Conventions

Standard collections and collection naming conventions

Collection Name/Pattern	Type	Description	Initial Definition	Future Definitions
HSC/raw/all	RUN	Collection into which all raw datasets are ingested. Should almost never be used directly as an input.	Created and populated by RawIngestTask.
HSC/raw	CHAINED	Default collection from which raw datasets should be obtained.	Symbolic link to HSC/raw/all.	Symbolic link to some HSC/raw/good/<ticket>.
HSC/raw/good/<ticket>	TAGGED	A curated set of exposures without problems, named according to a ticket that provides information about how the contents were selected.
HSC/calib/<ticket>	CALIBRATION	A set of certified calibrations with validity ranges, named according to a ticket that provides information about the process of creating it. Should only rarely be used as an input collection.	Created (if necessary) and populated by ConvertRepoTask, with "gen2" as default ticket number. Created (if necessary) and populated by Instrument.writeCuratedCalibrations.
HSC/calib	CHAINED	Default collection from which certified collections should usually be obtained.	Symbolic link to HSC/calib/<ticket>.
HSC/calib/<ticket>/<calibDate>	RUN	Collections into which to-be-official master calibration datasets are initially written. These are generally certified into a CALIBRATION collection with the same ticket number, but may also be certified into other CALIBRATION collections (e.g. if they have not changed) or simply never certified.	Created and populated by ConvertRepoTask, with "gen2" as default ticket number.
HSC/calib/<ticket>/curated/<calibDate>	RUN	Collections into which curated calibrations are written.	Created and populated by Instrument.writeCuratedCalibrations.
HSC/calib/<ticket>/unbounded	RUN	Collections hold (mostly curated) calibrations that have no validity ranges.	Created (if necessary) and populated by Instrument.writeCuratedCalibrations.
HSC/masks/<version>	RUN	Bright object masks created by external processes. There are official version names for the variants of these masks that are in use by the HSC collaboration, and we should use those names here for any historical masks. We should use ticket numbers (or some other versioning system that supersedes them) for new masks or non-HSC masks.	Created and populated by ConvertRepoTask, which will need to be configured with the version number.	I do not know what the Gen3 plan for direct ingest is for these, in part because I do not know in what form they have been delivered to Gen2 in the past, or what identifying metadata they have.
HSC/masks	CHAINED	Default collection from which bright object masks should be obtained	Symbolic link to some HSC/masks/<version>.
refcats	RUN	Collection that holds all reference catalogs (which are distinguished by DatasetType, which by convention embeds a version).	Created and populated by ConvertRepoTask.	We will ultimately need a pure Gen3 way to ingest/shared a reference catalog. This should not be hard, but I don't know much about what Gen2 does.
skymaps/<name>	RUN	Collection holding the `deepCoadd_skyMap` dataset for the skymap with that (dimension) name. The fact that skymap information is partially duplicated between Registry tables and this special dataset is something I consider a defect, but one to work around rather than attempt to solve until Gen2 is gone. I hope we can ultimately stop using these datasets and deprecate/remove these collections.	Created and populated by ConvertRepoTask.	Created and populated by `butler register-skymap` , which should replace pipe_task's `makeGen3SkyMap.py` .
HSC/defaults	CHAINED	Default input collection for HSC processing.	Chains HSC/raw, HSC/calib, HSC/masks, refcats, and skymaps/hsc_rings_v1
u/<user>/<ticket>[/<extra>]	CHAINED	Collection of ad-hoc processing outputs (and generally inputs) produced by <user>. <ticket> is optional but highly recommended; <extra> is totally at user discretion. Normal pipetask usage will also produce RUN collections nested under this with timestamp-based names.	Maybe created and populated (on request?) by ConvertRepoTask from /datasets/hsc/repo/rerun/private/*.	Direct pure-Gen3 processing.
shared/<identifier>/<ticket>	CHAINED	Collection of common-use processing outputs (e.g. HSC RC2), with identifier reflecting both the dataset and the (approximate) software versions used.	Created and populated by ConvertRepoTask from /datasets/hsc/repo/rerun/RC2 (etc).	Direct pure-Gen3 processing.

Notable Omissions

Sets of raws corresponding to some observing program should be obtained via registry metadata constraints (e.g. exposure.science_program), and we should add more metadata as needed). That makes it possible to use a HSC/raw/good/<ticket> input collection (possibly via symlink) as well.
Standard-reprocessing test datasets (e.g HSC RC2) require more thought; they are (e.g.) both sets of visits and sets of tracts, and while we can use collections of raws to approximate the former, we don't have a good way to represent the latter in the database. Maybe that's okay (we can, after all, represent those outside the database however we want), but I didn't want to assume that.
I'm still not sure what to do with the fgcmLookupTable dataset ( DM-27113 - Getting issue details... STATUS ).
I'm not sure how much BPS output collections resemble pipetask output collections. And while we may ultimately want for there to be no difference, for now I think it's most important to document those.

Space shortcuts

Page tree

Meta

Standard collections and collection naming conventions

Notable Omissions