Collection names below use "HSC" both as an example instrument name and as the most complete case we have, but conventions are expected to be transferable to any instrument (or in some cases, relevant only for HSC).

The term "symbolic link" is used here as shorthand for a CHAINED collection with a single child collection.

When converting Gen2 repos that did not use a convention of tracking variants with ticket numbers, we will simply use "gen2" where a ticket number would typically used in pure-Gen3 collections.  When a ticket number for the conversion itself exists, we should use that instead.

Standard collections and collection naming conventions

Collection Name/PatternTypeDescriptionInitial DefinitionFuture Definitions
HSC/raw/allRUNCollection into which all raw datasets are ingested.  Should almost never be used directly as an input.Created and populated by RawIngestTask.
HSC/rawCHAINEDDefault collection from which raw datasets should be obtained.

Symbolic link to HSC/raw/all.

Symbolic link to some HSC/raw/good/<ticket>.
HSC/raw/good/<ticket>TAGGEDA curated set of exposures without problems, named according to a ticket that provides information about how the contents were selected.

HSC/calib/<ticket>CALIBRATIONA set of certified calibrations with validity ranges, named according to a ticket that provides information about the process of creating it.  Should only rarely be used as an input collection.

Created (if necessary) and populated by ConvertRepoTask, with "gen2" as default ticket number.

Created (if necessary) and populated by Instrument.writeCuratedCalibrations.

HSC/calibCHAINEDDefault collection from which certified collections should usually be obtained.Symbolic link to HSC/calib/<ticket>.
HSC/calib/<ticket>/<calibDate>RUNCollections into which to-be-official master calibration datasets are initially written.  These are generally certified into a CALIBRATION collection with the same ticket number, but may also be certified into other CALIBRATION collections (e.g. if they have not changed) or simply never certified.Created and populated by ConvertRepoTask, with "gen2" as default ticket number.
HSC/calib/<ticket>/curated/<calibDate>RUNCollections into which curated calibrations are written.Created and populated by Instrument.writeCuratedCalibrations.
HSC/calib/<ticket>/unboundedRUNCollections hold (mostly curated) calibrations that have no validity ranges.Created (if necessary) and populated by Instrument.writeCuratedCalibrations.

Bright object masks created by external processes.  There are official version names for the variants of these masks that are in use by the HSC collaboration, and we should use those names here for any historical masks.  We should use ticket numbers (or some other versioning system that supersedes them) for new masks or non-HSC masks.

Created and populated by ConvertRepoTask, which will need to be configured with the version number.I do not know what the Gen3 plan for direct ingest is for these, in part because I do not know in what form they have been delivered to Gen2 in the past, or what identifying metadata they have.
HSC/masksCHAINEDDefault collection from which bright object masks should be obtainedSymbolic link to some HSC/masks/<version>.
refcatsRUNCollection that holds all reference catalogs (which are distinguished by DatasetType, which by convention embeds a version).Created and populated by ConvertRepoTask.We will ultimately need a pure Gen3 way to ingest/shared a reference catalog.  This should not be hard, but I don't know much about what Gen2 does.

Collection holding the deepCoadd_skyMap  dataset for the skymap with that (dimension) name.

The fact that skymap information is partially duplicated between Registry tables and this special dataset is something I consider a defect, but one to work around rather than attempt to solve until Gen2 is gone.  I hope we can ultimately stop using these datasets and deprecate/remove these collections.

Created and populated by ConvertRepoTask.

Created and populated by butler register-skymap , which should replace pipe_task's .
HSC/defaultsCHAINEDDefault input collection for HSC processing.Chains HSC/raw, HSC/calib, HSC/masks, refcats, and skymaps/hsc_rings_v1
u/<user>/<ticket>[/<extra>]CHAINEDCollection of ad-hoc processing outputs (and generally inputs) produced by <user>.  <ticket> is optional but highly recommended; <extra> is totally at user discretion.  Normal pipetask usage will also produce RUN collections nested under this with timestamp-based names.Maybe created and populated (on request?) by ConvertRepoTask from /datasets/hsc/repo/rerun/private/*.Direct pure-Gen3 processing.
shared/<identifier>/<ticket>CHAINEDCollection of common-use processing outputs (e.g. HSC RC2), with identifier reflecting both the dataset and the (approximate) software versions used.Created and populated by ConvertRepoTask from /datasets/hsc/repo/rerun/RC2 (etc).Direct pure-Gen3 processing.

Notable Omissions

  • Sets of raws corresponding to some observing program should be obtained via registry metadata constraints (e.g. exposure.science_program), and we should add more metadata as needed).  That makes it possible to use a HSC/raw/good/<ticket> input collection (possibly via symlink) as well.
  • Standard-reprocessing test datasets (e.g HSC RC2) require more thought; they are (e.g.) both sets of visits and sets of tracts, and while we can use collections of raws to approximate the former, we don't have a good way to represent the latter in the database.  Maybe that's okay (we can, after all, represent those outside the database however we want), but I didn't want to assume that.
  • I'm still not sure what to do with the fgcmLookupTable dataset ( DM-27113 - Getting issue details... STATUS ).
  • I'm not sure how much BPS output collections resemble pipetask output collections.  And while we may ultimately want for there to be no difference, for now I think it's most important to document those.

  • No labels