Igor Gaponenko :
- the first 6 nodes of the 15 nodes cluster have been installed and configured
- same hardware as on the older "loaner" cluster, except:
- 32 TB (vs 12 TB) of storage (10 + 2 NVMe disks configured as zraid2 == RAID6)
- ZFS compression is enabled
- (quickly) tested an effect of the ZFS compression
- no significant I/O performance degradation for large sequential I/O: 1.4 GB/s for writing 16 KB records, 4 GB/s for reading 16 KB records, 2.7 GB/s for writing 1 MB records, and 5 GB/s for reading 1 MB records, aggregate I/O capacity of a filesystem 16 GB/s or so. More tests are still needed.
- Noticeable CPU (system level) usage was observed during the stress I/O tests. Some of it may be caused by the compression
- the effective compression rate on the deployed catalogs DP02, DP01, and GADIA DR2 was 50%. This means that we may have ~70 TB of storage per node.
- Qserv instance
slac6
is still not up. Awaiting the service ("shared" in the SLAC IT terminology) account то бе created for running Docker containers and owning data on disks.
Topics to discuss:
- setting up
k8s
-based Qserv - strategies for sharing resources of the 15-node cluster between the
k8s
and the "igor" mode deployments
Fritz Mueller :
- the discussion started at https://lsstc.slack.com/archives/C028UBS4QTX/p1679501937692239
- keep the 6-nodes "igor"-mode Qserv for now
- ask 15 nodes to be federated into Kubernetes
- simultaneously set up 2 operator-based deployments (production and pre-deployment) and "igor" mode
- on the oversubscription of resources in case if all clusters would be sharing all resources:
- the "production" load is expected to be rather light within the next months, mostly from "Mobu"
- we have relatively small catalogs (DP02 and DP01) which aren't causing a lot of traffic
- memory pressure is a concern
Igor Gaponenko : we could mitigate the last problem by limiting resource usage by Docker containers and applications
Fritz Mueller decisions still to be made on how to install the Kubernetes clusters. Various options exist here:
- v-cluster
- separate cluster
- operator
- etc.
TODOs: