This page attempts to enumerate and briefly describe the concrete implementations of the Registry and Datastore interfaces (see DMTN-056) that LSST DM will need to provide.

Registry Backends

Operations

The master read-write SQL Registry that manages all files in the Data Backbone and provides direct database access to the Batch Production Service.

May have different schema than other Registries.

All URIs point to the DataBackboneStorage.

Reads and even more so writes are limited to Operations processes and specific staff. 

Pipelines executed in the Batch Production Service may not have direct database and data backbone access during runtime.    Instead input data and files will be pre-staged to local Registries and Storage.   Output data and files would be collected and later written to the Operations Registry and DataBackboneStorage.

Clients:

  • Batch Production Service may use its own clients.

DeveloperUserDB

A read-write full-SQL Registry that uses MyDB space to override or extend the tables in the Operations registry.

Not a confirmed part of the LDF, but could be a good solution for lots of use cases that would otherwise be hard, given the need to balance Data Backbone integrity requirements with developer needs to do test processing against major production runs.

URIs may point to DataBackboneStorage or DeveloperPersistentStorage.

Controls writes to DeveloperPersistentStorage.

Clients:

  • ...

DataRelease

A read-only, full-SQL Registry that holds the data products that are part of an official data release (including raw).

Copies much of the content in the Operations registry, with a different underlying schema.

All URIs point to the DataBackboneStorage.

A specialized release process would create the DataRelease Registry.

Clients:

  • ...

PlatformUserDB

A read-write full-SQL Registry that uses MyDB space to override or extend tables present in a DataRelease Registry.

URIs may point to the DataBackboneStorage or PlatformUserStorage.

Controls writes to PlatformUserStorage.

Clients:

  • ...

SQLite

A read-write full-SQL Registry that uses a SQLite database, intended for off-line or local management of small amounts of data.

URIs will probably usually point to LocalPosixStorage, but may sometimes point to some remote Datastores?

Clients:

  • ...

GenericDB

A read-write full-SQL Registry implemented on some readily-available SQL database server (e.g. MySQL or PostgreSQL), suitable for managing large collaborative Datastores and supporting large-scale processing runs that happen outside the LDF (e.g. HSC processing at Princeton/NAOJ, DESC processing at NERSC).

May also be used inside the LDF (presumably with an Oracle database) as an alternative to DeveloperUserDB for managing DeveloperPersistentStorage.

Clients:

  • ...

LimitedNoDB

A read-write "limited" Registry that uses simple key-value dictionaries persisted as e.g. YAML files instead of a full SQL database.

Can be used when staging datasets to/from scratch space during batch processing.  As a result, still needs to be able to record provenance (even if it isn't the final home of the provenance information it records).

May also be used to support specialized drivers for certain SuperTasks that allow them to run on arbitrary user-provided files (i.e. obs_file or processFile functionality).

URIs always point to LocalPosixStorage.

Clients:

  • ...

Datastore Backends

DataBackboneStorage

The storage system associated with the Data Backbone.

Writes are controlled by the Operations Registry.

Depending upon various other requirements (load balancing, disaster recovery, release process may produce new files, etc), the Storage for the DataRelease may be different than the Storage for Production.

Clients:

  • ...

DeveloperPersistentStorage

A shared storage system at the LDF, used by non-operator developers for long-term storage of inputs and outputs of small- and medium-scale processing runs.

Writes are controlled by DeveloperUserDB.

Clients:

  • ...

PlatformUserStorage

The storage system used to store Level 3 data products produced by science users in the LSST Science Platform.

Writes are controlled by PlatformUserDB.

Clients:

  • ...

LocalPosixStorage

A Datastore that is built on a regular POSIX filesystem accessible directly to the client.

Is its own client.


  • No labels