Database Meeting 2022-02-23

Date

23 Feb 2022

Attendees

Igor Gaponenko Fabrice Jammes Unknown User (npease) Andy Salnikov Joanne Bogart Colin Slater John Gates Fritz Mueller

Goals

Please, add topics for discussion as needed.

Discussion items

Time	Item	Who	Notes
	Project news	Fritz Mueller	Anything interesting from DM Leadership Team Virtual Face-to-Face Meeting, 2022-02-15 to 17 mostly concerned about "missing functionality" in light of the upcoming telescope commissioning scaling issue of the processing pipeline after moving from HSC to DP0.2 transitionary plan for SLAC (USDF) many small files in Butler are a big concern. Rucio doesn't scale well for millions of small files. image serv support to be added for DP0.2 meeting "embargos" News from USDF: hardware is coming and being installed HSC reprocessing will be tested at SLAC using USDF resources (Fritz Mueller is involved in this) No additional requirements/expectations for Qserv have been aired so far, except ingesting user data products: A lot of uncertainties in many areas, including API, authorization, registering tables in the TAP service. Another problem is retaining large result sets obtained from Qserv (5 GB scale, etc.). How to handle this? How do ingest from the perspective of "naive" users? Igor Gaponenko service-like ingest workflow for users might be a solution. concerns regarding user data quality (depending on the data format, data may need to be further processed/sanitized)
.	Status of DP0.2		Igor Gaponenko a slice (140 GB, 40 chunks) of the catalog ingested into the `small` Qserv cluster at NCSA DM-33733 - Getting issue details... STATUS Fabrice Jammes ingest workflow readiness for ingesting into IDF? Colin Slater what's the scale of the catalog compared to DP0.1? What's the ETA for the catalog inputs? roughly x5 increase in the amount of data (wider tables) data will be ready in late March ingesting will start in April overall DP0.2 readiness is June 2022 Joanne Bogart can we talk again about the extra tables for that catalog? response received via the team's Slack channel: Access to DC2 truth info such as SN parameters, stellar types, variability light curves, cluster locations, etc., has been requested by DP0 delegates, and the Rubin project has inquired if those data could be made available. JB: I don't know how they will be used or whether the content will ultimately end up in Qserv. DESC truth match tables may be added. Image metadata (expo) tables may be added to support images. A possibility of having match tables between DP0.1 and DP0.2. Maybe, depending on a request. Possible issues: 5x more data to lock in RAM for each chunk when processing JOINs DP0.1 has 2 TB in the total spread between 1000 chunks which requires 2 GB of RAM per chunk on average it means 10 GB/chunk for DP0.2 the expected dynamic range of the chunk size may be a potential issue here using SSDs may offset the need in locking chunks in memory
	Supporting modalities of Qserv deployments		This is a big topic that was discussed at the previous meetings. A number of ideas have been expressed, including database initialization into `init` containers. Fritz Mueller may have a proposal. the best practice in Kubernetes is to put databases and the relevant init containers in separate from the application pods the app pods would have init container waiting for the database init containers to be ready A problem is that database sockets can't be shared between pods in this architecture also root access will be required to the databases (why?) to address this, Fritz proposes to collocate pods on the same node. This still needs to be verified. IG: are init containers sequenced across different pods of the same deployment? IG: what about schema migration? Would it be done via the database pod? IG: the Replication system needs root access to the CZAR database Fritz Mueller will put these ideas into a proposal to be discussed further. The document will be shared with the team before the next group meeting. Relevant changes in the `entrypoints` are expected.
	Adding Replication system's "registry" (redirector) to the Replication system's deployment	Igor Gaponenk	The context: DM-33376 - Getting issue details... STATUS Besides adding a new service, schema change in the Replication database will be required. Fabrice Jammes `qserv-operator` needs to be modified and extended. Fritz Mueller will build a new Qserv release (or tag Qserv) based on the current state of the `main` branch before
	Progress on other topics discussed at the previous meeting Database Meeting 2022-02-09		The "Grand unification" of the Git packages DM-33618 - Getting issue details... STATUS Moving Qserv documentation into the container Fritz Mueller has it almost ready to go (after the PR will be created and reviewed): DM-32709 - Getting issue details... STATUS Fabrice Jammes do we still need to discuss migrating configurations of the tests catalogs at https://github.com/in2p3-dp0 to the current API of the Ingest system? ingest workflow config files would need to store the API version as well. This will be added later in Git package {{qserv-ingest}

Action items

Space shortcuts

Page tree

Date

Attendees

Goals

Discussion items

Action items