Date

Attendees

Notes from the previous meeting

Discussion items

DiscussedItemNotes
(tick)Project news

Fritz Mueller:

  • LSST gets delayed will Spring 2025, all contingency has been used, and a supplemental funding request for the funding agencies at work.
  • planning on going to Chile in May for about 1 week
  • may go for longer trips (a few weeks each time) in the future (yet to be decided)
  • we have 0.2 more of Andy Salnikov 's FTE to work on APDB and PPDB

Colin Slater 

(tick)Qserv at USDF

Igor Gaponenko :

  • data and management operations migrated to the service account rubinqsv 
  • cluster expanded to 15 nodes
  • some system packages are still missing on the new nodes, working with IT folks to fix that
  • data of slac6dev expanded/cloned to the new nodes 007-015. Node 001 is not going to be used to run the master services. When the expansion of this Qserv instance will be finished then we will have 14 workers. The ETA for bringing this instance back is (hopefully) the end of the day today.
  • qserv6dev  is up and running
    • should it get renamed to qserv15dev  or something else?
    • the replication level after the expansion is 2 and 1/3. I'm going to bring it down to 2. Replica rebalancing is in progress.
  • a similar rescaling will be done to slac6prod at the next step. 

Igor Gaponenko for Kubernetes, the last news from Yee was that they're going to set up a separate ZFS-base filesystem on each node for PVs. It will be visible with the following command (last time I checked it, I didn't notice any new filesystem beyond what we already had for the "low-tech" deployment mode):

rubinqsv@sdfqserv007 ~ $ zfs list
NAME      USED  AVAIL     REFER  MOUNTPOINT
zfspool  4.32T  27.5T     4.32T  /zfspool

Fritz Mueller Fabrice Jammes any other news on the Kubernetes installation on the nodes?

  • Fabrice Jammes: completed required Cyber training, waiting for the SLAC UNIX password to be reset
(tick) DP03

Fritz Mueller :

(tick)Upgrade the XROOTD version to ssi-5.3.x

Fritz Mueller:

  • Qserv will get moved to this version this week since it's been tested at USDF
  • will move to the newest (yet to be built) as soon as it will be ready
(tick)Slow (or "zombi") worker queries at USDF

Fritz Mueller:

  • the problematic queries are no longer seen at IDF after identifying a user who was initiating these queries. The vulnerability is still there and it needs to be addressed.
  • there is the preliminary (potential) solution to the problem which is to modify Qserv's RelationGraph to generate faster queries where the RefMatch tables are involved. The idea is not to materialize sub-chunks and to put a restrictor on the sub-chunks for the "mother" tables. In this case, MySQL would benefit from indexes on the latter.

John Gates any progress in figuring out how to cancel ongoing queries at workers? Apparently, there is some synchronization blockage in the worker code preventing XROOTD's Finalize  from that. Or, it's something else?

  • thinking about canceling all tasks if one task of a query gets called. This is blocked by the PR to be reviewed by Igor Gaponenko 
  • Igor Gaponenko proposed to pick havier queries to guarantee they're running at worker MySQL for minutes. A set of test queries that were used before might not be long-lived enough.
  • Andy Hanushevsky a lock may get involved in the Qserv code.

Fritz Mueller has proposed to extend workers with a bookkeeping mechanism

(tick)Status of the new Parquet  to CSV translator (partitioner)

Fabrice Jammes any news on the progress of:

Fabrice Jammes :

  • Sabine Elles has lost access to JIRA. The PR will move on after access will get restored.

Igor Gaponenko there is a piece of news on the related subject. The pipeline folks are about to introduce row grouping in the generated Parquet files. Experiments have been made with group sizes 0.5 GB  and 1 GB :

(tick)Problems with the k8s -based integration test at UKDF

Fabrice Jammes has an issue with k8s-based Qserv integration test reported by Greg Blow resolved?

  • yes, a problem was with the documentation on the ingest that wasn't mentioning a requirement to configure Git to use LFS. It was needed because the data files needed for the ingest are put into GitHub.
  • working with Greg to get him unblocked
  • considering improvements to be made to the integration test

Action items

  •