This page gives an overview of the Data Management-to-Camera/Telescope interfaces and how raw data and metadata are transferred from the observatory to the DM-resident science data archive.
Primary documents are:
- LSE-68 (Data Acquisition Interface between Data Management and Camera)
- LDM-230 (Data Management Automated Operations)
Also relevant are:
- LSE-69 (Interface between the Camera and Data Management)
- LSE-75 (Control System Interfaces between the Telescope & Data Management)
- LSE-76 (Infrastructure Interfaces between the Summit Facility and Data Management)
- LSE-77 (Infrastructure Interfaces Between Data Management and the Base Facility)
- LSE-78 (LSST Observatory Network Design)
- LSE-130 (List of data items to be exchanged between the Camera and Data Management)
- LSE-140 (Auxiliary Instrumentation Interface between Data Management and Telescope)
Camera Pixel Data
The camera pixel data comes from the camera data buffer (also known as the "Two-Day Store") within the Camera Data System (CDS, also known as the "DAQ" for Data Acquisition system; not to be confused with the DM DAC or Data Access Center). The buffer is a large flash-based storage system in a rack in the summit computer room. (gpdf 2014.6.10: The buffer servers are on embedded processors in the CDS, probably running RTEMS.)
The Camera provides a client library for accessing pixel data. We are using a "pull" interface: we provide an image identifier (not sequential, likely time-based) and a desired set of amplifiers (grouped into CCDs and rafts), and the CDS provides the pixel data in memory as 32-bit signed integers, with pixels from each amplifier (also known as a segment) grouped together and bias/overscan pixels attached. (Note that the baseline documents still say raw data will be 16-bit unsigned integers, as LCR-131 is still pending approval.) TBD: The exact structure of the memory object. The image identifier is published by the Camera Control System (CCS) no later than the start of CCD readout in a
startReadout event sent using the Observatory Control System (OCS) messaging middleware. It is also available by subscribing via the CDS client library to notification of new images.
The Base Data Management Control System (DMCS), acting as the Archiver commandable entity for the OCS, starts replicator jobs when notified that an archivable image is to be taken via a
startIntegration event (or other event for dark frames or engineering frames). These jobs subscribe to the CCS
startReadout event to obtain the image identifier and then pull two copies of the image, one crosstalk-corrected and one raw, via the client library and, indirectly, the Summit-to-Base network. The jobs run on prepared replicator nodes at the Base that have a long-lived connection over the long-haul international network with a paired distributor node at the Archive. The pixels are transferred over this connection. TBD: This batch-like execution was chosen for fault tolerance and to keep the execution mechanisms consistent between archiving, Alert Production processing, and Data Release Production processing; as the number of machines and processes performing the archiving has decreased and the need to gather state via subscriptions has increased, the trade-offs may favor a long-lived process system. TBD: The packaging and mechanism for transferring the pixels are not yet defined; this could be file-based (which could be FITS or not) or memory object-based; it is likely to use lossless compression. All successfully-archived images are recorded in a Base DMCS internal database.
If an image is not archived immediately after it is read out, it will be detected and archived by the Catch-Up Archiver. This OCS commandable entity is a process within the Base DMCS that uses the CDS client to list the contents of the camera data buffer. These are compared with the Base DMCS internal database of archived images to find unarchived images, which are then transferred over the international network during times when science data is not being transferred.
The Engineering and Facilities Database (EFD) contains records of all commands, events, and telemetry information sent over the OCS messaging middleware. It is a relational database implemented using MySQL. The OCS will maintain a replica at the Base; DM will also maintain replicas at the Base and the Archive (and, if approved, the French Center), as well as edited and possibly reformatted versions as part of the Science Data Archive in each DAC. All this replication uses MySQL-native technology and, on both the Summit-to-Base and international networks, uses the non-DM bandwidth allocation.
The replicator jobs will obtain the metadata required for Alert Production processing of an image from the OCS messaging middleware directly or the EFD. TBD: This may also apply to calibration metadata (which could be different), or the calibration metadata could be looked up from the EFD by the Calibration Products Production. The OCS provides an interface to query the last-set value of a particular topic; this is expected to suffice for the vast majority of metadata items (and to be more efficient than querying the EFD for a particular time range). These items include the visit identifier for standard science images; measured detector, atmospheric, and environmental characteristics; various timestamps (such as shutter motion records); and observatory configuration parameters for provenance. It is expected that most of these queries will be issued when the readout starts in order to get timely values.
The OCS is responsible for generating visit identifiers. The CDS is not currently expecting to provide the visit identifier along with each image, although this may change. This means that the visit identifier must be associated with an image by looking at the last-set value of the visit identifier topic, possibly along with additional checks to make sure that the image is indeed part of a visit. The CCS may attach the visit identifier to events it sends. TBD: It is not yet known if visit identifiers will be sequential numbers. The DM process control middleware will be able to handle visits composed of arbitrary numbers of images. TBD: Processing for visits of more than two images is not yet defined.
Image Data Destinations
All crosstalk-corrected and raw images with associated metadata are written in transmission format to the 14-day network outage buffer at the Base. The raw images are also written in science archive format (not necessarily FITS files, although FITS file export will be provided) to the tape archive disk cache at the Base, Archive, and, if approved, French Centers. Images are not written immediately to tape because tapes should be organized in a spatial fashion, not a temporal one; the disk cache allows for this reorganization. The same science archive format raw images and metadata are written to raw image caches at the Chilean and US Data Access Centers. These caches will contain at least the two latest visits covering any point on the sky. TBD: Is the network outage buffer really necessary given the tape cache and DAC cache at the Base? TBD: The requirements for prompt access to images from DM by observatory staff or processes remain to be analyzed; if such a requirement materializes, a private interface to, e.g., the network outage buffer or a DAC may be necessary.
Images and metadata that require processing are transferred to the Alert Production Cluster. For science visits, these are the crosstalk-corrected images; for catch-up processing, these are the raw images. Processing jobs pull the data from the distributor nodes, finding the appropriate one by contacting the Archive DMCS, which maintains a central directory service. The processing jobs are started upon notification by the Observatory Control System (OCS) that processing is needed; for normal survey science images this occurs via a
nextVisit event. For catch-up processing, the Catch-Up Archiver is typically configured to submit the processing jobs once the image to be processed has been identified in the Camera data buffer. TBD: Other events for special science programs, calibration, engineering. The processing jobs are configured to support the type of image(s) being taken, whether science, calibration, or engineering.
The observatory's auxiliary instrumentation includes some instruments that produce numbers (like GPS water vapor monitors), but it may include others that produce images, such as the auxiliary telescope that takes spectra of known standard stars to calibrate the atmosphere. All instruments report results as OCS telemetry. Large telemetry items such as images are not contained in OCS middleware messages; they are instead placed as files into a "blob store" and pointed to using references in the middleware messages. The EFD replication will replicate this blob store as well as the relational database. A tool like
rsync is expected to be sufficient.
The auxiliary instrumentation information is not currently expected to be used directly for Alert Production processing. Instead, it will be used during the daytime execution of the Calibration Products Production to create an atmospheric model. That model will be stored in the Science Data Archive in each DAC (with latency no greater than 24 hours from the start of the night's observing) and can be used to improve calibration of measurements taken during the night and recorded in the Level 1 Data Product database. As a result, transfer bandwidth and latency are not critical for this data.
If some auxiliary instrumentation information turns out to be needed for Alert Production, it will be collected as image metadata at the Base. Processing in the Archive will not be allowed to query the EFD, since its replication from the Base to the Archive will not be synchronous and may be delayed relative to the science images.