This page responds to the DM-SST action to write description of how our deployment strategy meets DMS-REQ-0297 and provides the input to the verification of LVV-T146

DMSR Requirement & Interpretation

3.3.1 DMS Initialization Component
ID: DMS-REQ-0297 (Priority: 1a)
Specification: The DMS shall contain a component that, at each Center, can initialize the DM Subsystem into a well-defined safe state when powered up.
Discussion: A safe state is one that does not permit the corruption or loss of previously archived data, nor of sending spurious information over any interface.

Derived from requirements:

  • OSS-REQ-0041: Subsystem Activation
  • OSS-REQ-0307: Subsystem Initialization
  • OSS-REQ-0121: Open Source, Open Configuration
  • OSS-REQ-0122: Provenance

The requirement was written 10 yrs ago and does not map to how we manage services today. Hence, we interpret the intent of this requirement to be that
-- DM services can be automatically (re-)started into a defined state without the need for manual procedures, e.g. someone needing to log on to potentially many different consoles to start services.

DM deployment Strategy

The DM service deployment model comprises::

  • Kubernetes (K8s) :  infrastructure for managing containerized services in one or more cluster environments.  Most DM services run on K8s 
  • Knative:  is built on top of Kubernetes and adds a higher-level abstraction specifically for serverless workloads.
  • Phalanx: Rubin Observatory’s GitOps repository for managing  Kubernetes environments.  Phalanx provides an installation and configuration platform for services deployed on Kubernetes clusters

Other related components 

  • Safir: Rubin Observatory’s library for building FastAPI services for Phalanx / Kubernetes clusters
  • Rucio: Service for managing large volumes of data spread across facilities at multiple institutions and organisations.
  • FTS3: Data movement service (see https://iopscience.iop.org/article/10.1088/1742-6596/513/3/032081)
  • S3: Amazon Simple Storage Service is a service offered –  object storage through a web service interface

DM Services – current deployment status

The following table lists the DM services described in LDM-148. The deployment plan and current status are listed (from a slack discussion in #dm-arch with Kian-Tat Lim )

The exact services have changed since the last iteration of LDM-148 but the categories remain the same. The only change is that all services originally planned for the base are now at the summit 

All of the following services are on Kubernetes or soon will be, except for the batch system which is independent but has its own "safe" startup state.

Service WhereNotesStatus
Archiving (Prompt Base) Summit, USDFS3 on K8s. Summit currently ssh/NFS to bare metal machine, which also comes up in a known/safe state. Deployed
Planned Observation Publication (Prompt Base) (a.k.a. ObsLocTAP)USDFWill be Safir/Phalanx on K8s  Not deployed
Prompt Processing Ingest (Prompt Base) (a.k.a. auto-ingest/embargo_butler)USDFOn K8s  Deployed
Observatory Operations Data (Prompt Base) SummitOn K8sDeployed
Observatory Control System (OCS) Controlled Pipeline (Prompt Base)SummitOn K8sDeployed
Telemetry Gateway (Prompt Base)SummitSasquatch/Phalanx/Kafka on K8sDeployed
Prompt Processing (Prompt US)USDFKNative on K8sDeployed
Alert Distribution (Prompt US)USDFOn K8sTests deployed, operational system not quite
Prompt Quality Control (QC) (Prompt US) USDF

Part of Prompt Processing payload, with publication to Sasquatch/Phalanx/Kafka and InfluxDB on K8s

Deployed
Batch Production (Offline Production, Satellite Facility) USDFThe batch system is independent. It uses Slurm not on K8s and has its own power-up safe state.  PanDA and its RCEs are either on K8s or are like Slurm.Deployed
Offline QC (Offline Production) USDF

Part of Batch Production payloads, with publication to Sasquatch/Phalanx/Kafka and InfluxDB on K8s

Deployed
Bulk Distribution (Offline Production) USDFRucio/FTS3 on K8sDeployed
Data Backbone (Archive Base and US) USDFRucio/FTS3 + ingested on K8sDeployed
RSP Portal (Commissioning Cluster and DACs) Summit, USDFPhalanx on K8sDeployed
RSP Notebook (Commissioning Cluster and DACs)Summit, USDFPhalanx on K8sDeployed
RSP Web API (Commissioning Cluster and DACs) Summit, USDFPhalanx on K8sDeployed
  • No labels