Needs Updating
Portal Aspect:
The initial version of this is the PDAC.
Initial components include:
- SUIT query/visualization portal using Firefly — data retrieved via DAX web services
- DAX web services
- Low-level:
- dbserv
- Raw ADQL/SQL interface, output format translation
- Talks to Qserv
- dbserv
- Higher-level
- metaserv
- Queries databases (e.g. ScienceCcdExposure table)
- Generates lists of Butler ids (dataset type plus dataId)
- imgserv - mosaic/cutout, regeneration, output format translation operations
- metaserv
- Low-level:
- Files in GPFS with organization as prescribed in RFC-95, RFC-249
- Qserv database
Later:
An authentication/authorization component will be added that connects to or passes credentials through/to all other components.
A Global Metadata Service will be created to track groups of datasets (Butler repositories) in the Data Backbone. The Global Metadata Service also stores information about available databases.
metaserv then talks to the Global Metadata Service.
imgserv could be expanded to become a read-only "butlerserv". There are two additional functions: returning Butler locations of datasets, which requires a Butler client on the remote end to retrieve and deserialize the datasets, and format translation in which an internal-to-imgserv Butler retrieves the in-memory object for the dataset and streams it to the recipient in a desired format.
Qserv per-user databases will be added as the results of and inputs to portal queries; dbserv will be able to create and query these.
Other RDBMS-based (non-Qserv) databases will be added, including the SQuaSH QC database, provenance databases, and non-Qserv per-user databases; dbserv will be able to create (where appropriate) and query these.
Per-user file storage will also be added.
The Data Backbone will manage the files, replacing the direct GPFS interface (GPFS will still be used underneath). It will perform inter-site replication and transparent (except for latency) retrieval of files from the tape archive. The Butler must be able to retrieve files from the Data Backbone. This can be a staged process (requesting files through a translation dbbToButler utility) and then using a Butler configured to talk to the local filesystem, but it will be more convenient and desirable to have the Butler talk directly to the Data Backbone.
Notebook Aspect:
The initial version of this is for Science Pipelines QA on processed HSC data and does not access SDSS or WISE data in the PDAC.
- Minimal authentication/authorization (Unix user ids on JupyterHub server)
- Local JupyterHub server
- Files in GPFS
- "Monolithic" non-Qserv RDBMS (expected to be MySQL, could even be Oracle or Postgres) instance on new lsst-db containing HSC catalog data products and per-user databases
- Filesystem Butler interface
- Used with local filesystem and GPFS
- SQLAlchemy (as our current RDBMS-agnostic interface) or Python DB-API interfaces to databases
- Connects to RDBMS
- Connects to SQuaSH QC database
- Science Pipelines stack installed and available in the noteboook
- Firefly visualization widgets available in the notebook
- Batch computing on the Verification Cluster via separate shell or shell escape from the notebook
Later:
The Data Backbone and its Butler interface are described above.
DAX services will be implemented to allow added operations on top of file retrieval and database query, including TAP, SIA, and other VO interfaces.
An OpenStack cluster with (for example) Kubernetes is provided for interactive computing.
The JupyterHub server is expanded with features such as:
- Subdomain-per-user and wildcard DNS/HTTPS for security (I think this is best practice)
- KubeSpawner (for example) to provide elasticity for notebooks and compute
The batch cluster could be moved to OpenStack as well.
Straightforward transport of computations from the notebook world to the batch world, controlled by the notebook, remains to be defined.
When Qserv-based data products, per-user Qserv databases, and other RDBMS-based databases are available, connectivity to them through Python DB-API, SQLAlchemy, and the Butler will be provided.