Database meeting 2023-03-22

Date

22 Mar 2023

Attendees

Igor Gaponenko Colin Slater Andy Salnikov Andy Hanushevsky John Gates Fritz Mueller

Notes from the previous meeting

Database meeting 2023-03-08

Discussion items

Discussed	Topic	Notes
	Project news	Fritz Mueller, Colin Slater - any news from the DM in Chile? Fritz might disappear for a few weeks for Chile
	Qserv at USDF	Igor Gaponenko : the first 6 nodes of the 15 nodes cluster have been installed and configured same hardware as on the older "loaner" cluster, except: 32 TB (vs 12 TB) of storage (10 + 2 NVMe disks configured as zraid2 == RAID6) ZFS compression is enabled (quickly) tested an effect of the ZFS compression no significant I/O performance degradation for large sequential I/O: 1.4 GB/s for writing 16 KB records, 4 GB/s for reading 16 KB records, 2.7 GB/s for writing 1 MB records, and 5 GB/s for reading 1 MB records, aggregate I/O capacity of a filesystem 16 GB/s or so. More tests are still needed. Noticeable CPU (system level) usage was observed during the stress I/O tests. Some of it may be caused by the compression the effective compression rate on the deployed catalogs DP02, DP01, and GADIA DR2 was 50%. This means that we may have ~70 TB of storage per node. Qserv instance `slac6` is still not up. Awaiting the service ("shared" in the SLAC IT terminology) account то бе created for running Docker containers and owning data on disks. Topics to discuss: setting up `k8s` -based Qserv strategies for sharing resources of the 15-node cluster between the `k8s` and the "igor" mode deployments Fritz Mueller : the discussion started at https://lsstc.slack.com/archives/C028UBS4QTX/p1679501937692239 keep the 6-nodes "igor"-mode Qserv for now ask 15 nodes to be federated into Kubernetes simultaneously set up 2 operator-based deployments (production and pre-deployment) and "igor" mode on the oversubscription of resources in case if all clusters would be sharing all resources: the "production" load is expected to be rather light within the next months, mostly from "Mobu" we have relatively small catalogs (DP02 and DP01) which aren't causing a lot of traffic memory pressure is a concern Igor Gaponenko : we could mitigate the last problem by limiting resource usage by Docker containers and applications Fritz Mueller decisions still to be made on how to install the Kubernetes clusters. Various options exist here: v-cluster separate cluster operator etc. TODOs: Fritz Mueller We need to decide what tooling to use. UK has its own tooling. Need to ask Fabrice Jammes. Igor Gaponenko should contact Amedeo to find out when the rest of the cluster will be ready
	Experimentation with the file-based result delivery in Qserv workers	Igor Gaponenko: DM-38069 - Getting issue details... STATUS this is mostly done. The experimental version for Qserv passes all but 1 integration test: `czar` now pulls result sets from workers as files using XROOTD file protocol the files are now automatically removed after merging results into the result tables with no performance numbers yet. Large-scale tests are still needed. The tests will be made using `slac6` at USDF after the instance is back. the failed integration test is an interesting one. It may be the RefMatch-type scenario. I suspect the mainstream code of Qserv may have a big that I ran into in the experimental version. Or, I'm missing some special "hook" made by workers to deduplicate rows in the partial result sets before delivering those to Czar Andy Hanushevsky suggested not using the async file read due to poor performance of that on Linux Next steps: large-scale/large-result performance testing at USDF code clean up migrate from Protobuf-encoded messages to CSV

Action items

Space shortcuts

Page tree

Date

Attendees

Notes from the previous meeting

Discussion items

Action items