Types of Image Metadata

In the operational LSST system, image metadata as contemplated by the DPDD comprises several types of information:

  • Information from telemetry and events produced by Summit systems (including their configuration information)
  • Information from DM configurations
  • Information generated by DM pipelines, including metrics and calibration data
  • Information from DM processing provenance

Much of this information is directly recorded in the metadata within persisted image files (e.g. as FITS headers when the image files are in FITS format).  In addition, the directly-recorded metadata may be augmented or replaced by other metadata obtained from database tables upon retrieval of a given image file.  Those database tables might include image metadata tables in the science database, tables in the Calibration Database, or provenance tables in the Data Backbone.

Telemetry and Event Data

Some of this information is captured in near-realtime by the Header Service that runs on the Engineering and Facility Database (EFD) cluster at the Summit.  The data captured is what is required by the near-realtime Prompt Processing payloads (Alert Production and Raw Calibration Validation), by other Observatory systems (e.g. the Active Optics system and the LSST Atmospheric Transmission Imager and Slitless Spectrograph), and by "best practice" standards.  (In addition, the Header Service may be configured with fixed or very-slowly-changing information that is not contained in telemetry or events such as the observatory position.)  This information is provided as metadata within the "raw" image files delivered by the Archiver and Prompt Processing Forwarders.  As the files are ingested into the Data Backbone, their metadata will be made available as image metadata tables in the science database (part of the Consolidated Database).  These tables are also expected to be used by the Data Butler as part of its Registry.

The remainder of this information is captured in the EFD replicas at the Summit and Base.  Entries in those databases are transferred to DM and saved for disaster recovery.  Selected entries are also proposed to be transferred to DM via a separate system that aggregates and transforms them into a DM-specialized version of the EFD for use by staff and science users.

The Transformed EFD will be queried by Calibration Products Productions in the process of generating Calibration Products and the Calibration Database.  It is not anticipated that it will be directly queried by other Science Pipelines.

In the process of generating Calibration Products or preparing for a Data Release Production, new values for metadata items may be generated from time to time as our understanding of them improves.  For example, we may update the conversion from voltage to physical units for a telemetry item, change the method of interpolation for a slowly-changing item, or change the method of averaging for a rapidly-changing item.  Some simply incorrect values caused by hardware or software failure may be replaced.  All changes of this type will result in new versions of the image metadata in the raw image metadata tables.  The Data Release Production, Calibration Products Productions, and the image access services may all request various versions of the image metadata: the version recorded at pixel capture time ("original"), the latest and "best" version, or a specific version from a particular (processing, not observation) date/time.  This will be implemented in the Butler as a composite dataset with pixels from a file and metadata from the database.

DM Configurations

This includes configuration information associated with a so-called "obs_" package (such as camera geometry information) and configuration information used to control algorithms in Science Pipelines.

Any of this information may be included by the pipeline tasks in the metadata of their output image files (e.g. "postISRCCD" ISR-processed images).

Pipeline-Generated Information

This includes metrics generated by pipelines as well as (typically bitemporal) calibration data, from the Calibration Database or from a collection of calibration datasets.  For some cameras, camera geometry information may be one of those datasets.

Again, pipeline tasks may include any of this data in the metadata of their output image files (e.g. "calexp" processed visit images).  Processed image metadata tables in the science database (which may also be used by the Data Butler as part of its Registry) will have metrics ingested directly from pipeline outputs or by extracting the metadata from the output images.

In addition, some metrics or values produced by the pipelines will be sent to the Quality Control system, and some will be used to generate telemetry or events for Summit systems (and therefore will come back via the EFD).

Processing Provenance

This includes information on how, where, when, and to an extent why a given image file was produced.  This information is maintained by the Data Backbone and Workflow/Workload Management system, in conjunction with PipelineTask.  It is saved in database tables within the Data Backbone, but selected elements could be persisted in image file metadata.

  • No labels