IVOA standards, and the CAOM2 data model, call for a variety of metadata identifiers for "static" information about the entities and equipment associated with astronomical data.

With the Vera C. Rubin Observatory renaming in mind, now is a good time to try to nail down some of this information.

This page doesn't directly address some very similar questions that arise with respect to setting values for FITS headers in our raw data files.  Readers are encouraged to comment on any points from that data model that affect the issues and proposed decisions on this page.

TL;DR: Proposal

ObsCore

  • facility_name values:
    • For all main telescope observations: "Rubin-SST" (for "Simonyi Survey Telescope")
    • For all AuxTel observations: "Rubin-AuxTel" (in preference to "Rubin-AT", despite breaking parallel construction)
    • Possible other "Rubin-*" values for other instrumentation.  For instance, the all-sky camera(s) could have their own values, or they could share "Rubin-SST" with a different instrument_name ; this is TBD.  The deciding factor may be the link with the telescope location coordinates in CAOM2, discussed below.
  • instrument_name  values:
    • "LSSTCam" and
    • "LATISS",
    • with additional TBD names for any other instrumentation for which we wish to provide ObsTAP/SIAv2 access, e.g., the all-sky camera(s)
  • collection_name values:
    • "LSST-*" for all released data collections arising from the post-commissioning Legacy Survey of Space and Time
      • "LSST-DRnn-*" for all data collections arising from Data Releases, e.g., "LSST-DR01-*"
      • "LSST-Prompt-*" for all data collections containing Prompt data products
    • Within this framework, the rest of the names remain to be defined.
    • Project-created data products from Special Programs should fall within this framework, but may have an additional field in the collection_name  following the above prefixes, e.g., "LSST-DR01-SPxx-*".
    • Collection name values for any User-Generated data products visible through the main ObsTAP/SIAv2 services are TBD, as is whether this will happen at all, or whether User-Generated products will have their own services.

CAOM2

(Proposal still being developed)

ObsCore data model

In the ObsCore data model for observation metadata, there are several metadata attributes that contain this sort of "static" information:

  • facility_name : "Name of the facility used for this observation"
    "The Facility class codes information about the observatory or facility used to collect the data. In this model we define one attribute of Utype Provenance.ObsConfig.facility.name which re-uses the Facility concept defined in the VODataService specification."
  • instrument_name : "Name of the instrument used for this observation"
    "The name of the instrument used for the acquisition of the observation. It is given in the model as Provenance.ObsConfig.instrument.name and encoded as a string."
  • obs_collection : "Name of the data collection"
    "The obs_collection column identifies the data collection to which the data product belongs. A data collection is any logical collection of datasets which are alike in some fashion. Typical data collections might be all the data from a particular telescope, instrument, or survey. The value is either the registered shortname for the data collection, the full registered IVOA identifier for the collection, or a data provider defined shortname for the collection."  Additional guidance in this section of the standard suggests the intent that a collection name be intelligible as coming from a particular facility/instrument pair, when that makes sense for its content.

Spaces are acceptable in the name strings, and are found "in the wild" in existing large astronomical archives offering ObsTAP service.

CAOM2 data model

The CAOM2 data model (version 2.4) has the following almost directly equivalent attributes of this nature:

  • Observation.telescope, a composite object of type Telescope representing "the telescope or facility where this observation was created", also "the telescope used to acquire the data for an observation".  That object in turn has an attribute Telescope.name , "common name of the telescope; TBD: reference to a standard list of names?".  That attribute is generally taken as mappable to ObsCore's facility_name when CAOM2 data is used to support an ObsCore-type service such as SIAv2 or ObsTAP.
  • Observation.instrument, again a composite object, of type Instrument, representing "the instrument or detector used to acquire the data", also "the instrument used to acquire or create the observation; this could be used for both physical instruments that acquire data or software that generates it (e.g. simulated data)".  That object in turn has an attribute Instrument.name , "common name for the instrument".  This clearly maps to ObsCore instrument_name.
  • Observation.collection, a string attribute: "the name of the data collection this observation belongs to".

In addition, CAOM2 has attributes for the physical coordinates of the "Telescope" (geoLocationX/Y/Z), and it allows the association of data-provider-defined keywords with both the Telescope and Instrument objects.

Data to be represented

The model adopted for Rubin/LSST should accommodate at least the following pairings of actual physical telescopes and instruments (here presented without an implication of how they will be represented ):

  • 8.4 m Simonyi telescope
    • LSSTCam instrument, with a variety of observing modes which will produce science sensor images and wavefront sensor images as part of standard (2x15) or alternate-standard (1x30) visits, as well as (potentially, depending on mode) either “normal” exposures on the guide sensors as part of visits, or guider postage stamps at, nominally, 9Hz, possibly available as “time cubes”
    • ComCam instrument, with a variety of observing modes but most commonly 9-CCD coordinated exposures
    • (possibly for some early testing of M3) Shack-Hartmann wavefront sensing instrument
  • AuxTel (ex-Calypso) telescope
    • LATISS instrument, with a variety of imaging and spectroscopic observing modes
    • Pilot camera used in early AuxTel operations
    • Pilot spectrograph (two? red/blue)
  • All-sky camera (visible light) - images will initially be in LFA but could (and should! - gpdf) eventually be made available through IVOA services
  • All-sky camera (infrared light) - Chuck Claver tells me this is doubtful at first light but will eventually show up
  • Test stand data (possibly only relevant for internal services)

Discussion

Note that these two data models and the mappings conventionally applied between the two do not appear to distinguish between the "facility" and the "telescope" as separately settable attributes and therefore do not readily permit us to separately specify "Rubin Observatory" and "Simonyi Survey Telescope".  Nor, therefore, do they let us have a single (presumably "Rubin"-esque) facility name simultaneously with multiple telescope names (Simonyi and AuxTel/ex-Calypso).

Within the CAOM2 data model, the presence of the geoLocationX/Y/Z  coordinates and the non-trivial separation of the two Rubin Observatory telescopes suggests it may be preferable to have distinct identifiers for the two telescopes, so that their locations can be distinguished sensibly.

In any event, the practical options seem to be:

  1. Adopt a single facility_name value, e.g., "Rubin Observatory" for all the above data, and, in ObsCore, distinguish the different sources of data only by instrument_name (and obs_collection , presumably).  
    1. Create a single CAOM2 Telescope object for all the data, compromising by associating all data with the Simonyi Survey Telescope's physical location.
    2. Create distinct CAOM2 Telescope objects as appropriate, with appropriate physical location coordinates, but all with the same name, e.g., "Rubin Observatory".  This seems likely to be confusing to users who may expect Telescope.name  to be usable as a primary key for Telescope objects.
    3. Create distinct CAOM2 Telescope objects as appropriate, with appropriate physical location coordinates, and with different  names.  This requires implementing a special heuristic in the code that generates the ObsCore view onto the CAOM2 tables (this is the conventional way to support both CAOM2 and ObsCore data models on the same data) to map all the different Telescope.name  values to a single ObsCore facility_name string.
  2. Adopt different facility_name values, but with a common prefix, for each of the Observatory's "telescopes" (four of which are enumerated above), plus one for test stands generically.  Assign unique instrument_name values, as appropriate, to all the data sources.  Have the CAOM2 Telescope.name  values match the ObsCore facility_name values exactly.

The latter seems simpler to implement and easy for users to understand.  Queries against all the Rubin/LSST data could be constructed in ObsTAP with a LIKE 'Rubin%'  test, if the common-prefix approach were taken, while in SIAv2 this would require explicit enumeration of the acceptable values, using the DALI standard that repeated query parameters represent a logical OR, e.g., ...&FACILITY=Rubin-SST&FACILITY=Rubin-AuxTel&FACILITY=Rubin-AllSky&... .  (The SIAv2 standard provides for a way for a service to advertise all legal values to its users, so the necessary string could be discoverable, though it would still require "external" knowledge of the "Rubin-*" convention.

From interactions with existing ObsCore-structured data at other archives, it seems preferable to have the instrument_name values be intelligible on their own to a well-informed astronomer, and likely (though not guaranteed) to be unique across all major facilities.  This suggests not using strings like just "Camera" or "FUV".  "LSSTCam" and "LATISS", already well-established names in the project, certainly both satisfy this desire.

Since the precise physical location of the two all-sky cameras may not be particularly useful or important, and they are likely to be close to each other in any event, one option may be to represent both as a single ObsCore "facility" and CAOM2 "telescope" with two instruments.

Collection-naming considerations

As noted above, it is suggested that facility and instrument indications be included in obs_collection  values.  However, for our purposes, it seems that indicating an "LSST" (in the new definition) purpose  to a data collection should be a key goal, so collection names should be chosen accordingly.

Appropriate classifiers to include in the collection name may include:

  • DR1 vs. DR2 vs. DR3, etc. vs. Prompt vs. Science Validation campaigns
  • Level of processing, particularly distinguishing single-epoch from coadded image data
  • Main-survey vs. special-programs data

ObsCore allows distinguishing observation metadata entries on two values, dataproduct_type and dataproduct_subtype .  The former has a very rigidly specified set of values (i.e., it's an enumeration) and we are likely to use only "image" and "spectrum" at any significant level.  The latter is completely data-publisher-dependent and we can use it, in particular, to represent our own internal dataset types.  This will help users map from the IVOA view of the data to the Butler view of the data as they move back and forth between different interfaces.  This means that, for instance, the PVIs and difference images from Prompt processing could be in the same collection but distinguished only by different dataproduct_subtype  values; or they could be in different collections.

It's also important to distinguish between collection names in the production ObsTAP/SIAv2 services, which are unambiguously in a global namespace across all the history of, at least, the LSST survey (unless we run separate services - i.e., at distinct URL endpoints - for each data release), and collection names which may appear in ObsCore representations of the data in a single Butler Gen3 repo.  The latter do not need to be guaranteed to be conformant to the naming standard for the public data services - but they should ideally still have a readily discernible relationship to the identity of the underlying repo.

Appendix: Comparable data in other archives

CADC

  • Examples of facility_name  values: "CFHT 3.6m", "Chandra X-ray Observatory", "CTIO-1.5m", "Gemini-North", "Gemini-South", "HST", "JCMT", "SUBARU", "XMM-Newton"
  • Examples of instrument_name values: (for facility "HST") "ACS", "ACS/HRC", "ACS/SBC", "ACS/WFC", "NICMOS", "NICMOS/NIC2", "NICMOS/NIC3", "STIS/CCD", "WFC3", "WFC3/IR", "WFC3/UVIS", "WFPC2", "WFPC2/WFC; (for other facilities) ACIS-I", "CPAPIR", "EMOS1", "EMOS2", "EPIC PN", "F2", "FLAMINGOS", "FOCAS", "GMOS-N", "GMOS-S", "GNIRS", "HSC", "MegaPrime", "Optical Monitor", "PHOENIX", "RGS1", "RGS2", "SCUBA-2", "Suprime-Cam", "WIRCam"

MAST

  • Examples of facility_name  values: "CALTECH", "HEASARC/GSFC", "NASA-AMES", "Spitzer Science Center", "STScI", and many nulls
  • Examples of instrument_name values: "ACS", "ACS/HRC", "ACS/SBC", "ACS/WFC", "ASTRO-2 WUPPE", "COS/FUV", "COS/NUV", "FGS", "GALEX", "HSP/POL", "HSP/UNK/UV2", "IRAC", "Kepler"



  • No labels