• This is a draft for an upcoming technote.
  • This note is without prejudice to how the Prompt Products image data products will be made available to data-rights holders.

Abstract

Presents a mechanism for making image metadata, and image data, available via IVOA-standard mechanisms, and therefore also via the Portal Aspect of the RSP, available essentially in real time relative to the arrival of data in a Butler repository.  Discusses where and how this capability will be deployed for staff-facing use at the Summit and at the USDF.

Concept

Starting with DP0.2 in 2022, the RSP has provided access to image data in two ways.

Users in the Notebook Aspect may access images directly through a standard Butler repository interface.

In addition, we provide an IVOA-standard solution for access to image metadata, images, and associated services through the API Aspect, i.e., via Web services.  The Portal Aspect provides image access through UI elements that take advantage of these IVOA-standard mechanisms.

The basic architecture includes:

  • An authenticated image-metadata service providing for ADQL queries, via IVOA TAP, against an image metadata table in the IVOA ObsCore format.  For each image, this provides uniform space, time, and wavelength metadata, as well as, optionally, additional Rubin-specific metadata columns.  Access to the images themselves are provided via a URL to an IVOA DataLink "links service".  (This indirection follows the same scheme pioneered by CADC.)
    • Optionally, in some RSP deployments, we will also provide an IVOA SIAv2 service for access to the same metadata.  This is required for the public-facing RSP.
  • An authenticated IVOA DataLink service that responds to an image-access URL from the metadata service with a "links table" that includes, among other things, a URL to the actual image data file.  It can also include links to related data (e.g., thumbnails) and related services (e.g., cutouts), depending on configuration.
    • This service is a new Rubin creation, with the code in lsst-sqre/datalinker
    • The image data file URL is a signed URL, generated by the DataLink service, that provides time-limited HTTPS access to the image data file, which may be on an S3 bucket.
  • A service that responds to the signed URLs for actual image data access.  For cloud-based data, this service is actually implemented by the cloud service provider.

For DP0.2, the ObsCore image metadata table was generated statically by a one-time ETL activity on the data in the DP0.2 dataset's Butler repository Registry database.  This process incorporated some Rubin-specific columns into the table in addition to the mandatory ObsCore ones.  The table for DP0.2 was ingested into the same Qserv database as the DP0.2 catalog tables, and named ivoa.ObsCore , a name prescribed by the standard.  Served through the same CADC-derived TAP-over-Qserv service as the DP0.2 catalogs, that makes this a so-called "ObsTAP" service.

With these service in place, the Portal Aspect is able to provide image queries based on ADQL searches of the ObsCore table.  The Firefly software recognizes the DataLink URLs and links-service responses, providing UI access to the images themselves, along with a basic cutout service.  The DP0.2 UI includes both form-based queries based on location, date/time, and wavelength/filter, and free-form ADQL queries.  Time-based queries, in particular, can be made on explicit calendar-date ranges or on relative ranges (e.g., "last 12 hours", "last 7 days").  Later versions of the Portal will include further simplified forms for the most common searches.

The use of Qserv somewhat limits the query capabilities of this service, because Qserv does not implement ADQL's INTERSECTS() operation.  This also prevents, at this time, an SIAv2 service from being deployed over the DP0.2 image data, because all the SIAv2 spatial queries, interpreted as ADQL, are INTERSECTS-based.

The ETL from the DP0.2 Butler to its associated ObsCore table was done by code in the lsst-dm/dax_obscore repository.  Andy Salnikov created a templated, configurable framework for transforming Butler data to ObsCore, with a number of attributes, notably, derived from the Butler dataset type of an image.  The framework allows restriction of the ETL to designated Butler collections and dataset types.

NB: While the project has previously committed to providing image metadata in the CAOM2 data model, presumably expressed as a TAP-based materialization of that data model (as is done at CADC, MAST, and IRSA, at least), in addition to the provision of the closely related ObsCore model, a detailed architecture for the population of the CAOM2 tables has not yet been devised, let alone deployed.

  1. Note also that the ObsCore table (and CAOM2 tables) should not be confused with the "Consolidated Database" content, which goes far beyond those standardized data models, nor with the tables like "Visit" that exist in the LSST Science Data Model.  Work to more explicitly link the ObsCore and Visit tables in DP0.2 is still pending and will help to illuminate the remaining details to be decided here.

"Live" version of image services

Experience with the DP0.2 image service suggested that a similar capability would be very useful for recently acquired data, updated automatically rather than requiring the static export/ETL that was done for DP0.2.

A prototype effort was initiated.  Andy Salnikov analyzed several options, including attempting to create an ObsCore table as a database view over the Butler Registry tables, as well as the use of triggered stored procedures, and settled on an approach that involved modifications to the Butler itself to enable the parallel creation of ObsCore data alongside the population of the Registry tables for newly created datasets.  The process depends on a configuration similar to that used for the static ETL above, and is also implemented by code in the lsst-dm/dax_obscore repository.

Because this process occurs essentially as part of the normal operation of the Butler, it produces the ObsCore table in the same PostgreSQL database as the Butler Registry tables.  The CADC TAP service's existing support for Postgres is used to provide TAP access to this table.  The implementation has significantly more complete geometry support than Qserv's.  It also supports multi-target queries via temporary-table upload, another capability not currently available in Qserv.

This has been deployed on the /repo/embargo Butler repository at the USDF, and is available in the USDF RSP by selecting the "LSST Live ObsCore" TAP service (https://usdf-rsp.slac.stanford.edu/api/live) via PyVO or in the TAP services menu in the USDF RSP Portal Aspect.  The actual table is named oga.ObsCore, so technically this is not an "ObsTAP" service, but the Portal Aspect is able to recognize the ObsCore data model in this table all the same, and provide a suitable query UI.  

In order to support the service, the Rubin-developed DataLink "links service", based on lsst-sqre/datalinker , was deployed at USDF, and a capability to create, and serve, signed URLs for images in the USDF object store was also developed and deployed.

Some details are still being ironed out, but the service is clearly useful and the time has now come to decide how to proceed from this prototype.


Desired Services

Recent discussions with members of the DRP team have confirmed interest in pursuing this further.  There is clear interest in providing staff-facing access to data in /repo/main as well as in /repo/embargo, and in extending the service configuration to additional data.  In particular, DRP team members expressed a strong interest in being able to access data in user collections via this route, as well as data resulting from periodic system-integration test runs.  After some discussion, we agreed that it would not be necessary to expose all  user collections, but a means for designating a regular expression or other matching pattern defining a subset of users' collections would be useful.

While the potential relevance of the work described herein to the provision of data-rights-holder facing access to Prompt Products image datasets following the embargo period is clear, in practice no decision has been made and there are genuine issues associated with whether it might be preferable to provide, for instance, daily bulk updates to 

Work has started on configuring a deployment on the Summit, as an adjunct to other data access and visualization capabilities being provided via the FIG effort.  Once again, this will require the following steps:

  1. Configuration of the "live" internal ETL from Butler dataset creation and deletions to creation and maintenance of an ObsCore table in the Summit 

Configuration Details

Summit


  • No labels