Date

Attendees

Notes from the previous meeting

Discussion items

DiscussedItemNotes
(tick)Project news
(tick)Investigating worker lockups

 Latest news:

  • a possible source of the lockups was identified as an application deadlock when attempting to create/query/delete materialized sub-chunk tables in the work databases
  • DM-37983 - Getting issue details... STATUS
  • the new Qserv release 2023.2.1-rc2 was built, deployed, and tested in slac6  , and IDF -int. The fix seems to work as expected.
  • work on wiring Qserv Czar  into the Replication system's monitoring has begun
  • DM-37692 - Getting issue details... STATUS
  • DM-37692 - Getting issue details... STATUS

Fritz Mueller:

  • the release will be deployed tomorrow at -prod during the usual "maintenance window"
  • may optionally do the pre-upgrade testing of the existing Qserv vs the new one to see if there won't be any negative performance impact of the new release
  • one observation - the Replication system's worker registry container didn't get upgraded during the ad-hoc  update of -int 

John Gates has made a proposal to speed up worker data transfers:

John Gates has another improvement for worker monitoring at work. It should be wired into the Replication system's monitoring and the Dashboard.


(tick)An overview of the future object table

Colin Slater:

  • Cell-based Coadds and Shear Measurements -

    slides
(tick)Improved result delivery in Qserv

A dedicated EPIC has been registered in JIRA:

It will host a series of related developments to go in parallel with the mainstream version of Qserv. The current focus is at:

Kian-Tat Lim :

  • think about small-result optimization
  • concerned about time-based or space-based "garbage collection" at workers
  • consider an explicit mechanism using XROOTD (or SSI)
  • in the long run - consider using a different protocol (HTTP) for result delivery

Igor Gaponenko :

  • we have a result schema sent with each result back to Czar.
  • this may have significant overhead for small result queries
  • investigate an option for deducing the schema of the result table at Czar  using the local version of the tables 
(tick)Qserv at USDF

Igor Gaponenko news:

  • the first batch of 15 nodes is about to be installed
  • each node will have 32-cores/64-threads single AMD CPU, 256 GB of RAM, and 12x3.5 TB NVMe
  • assuming ZFS on top of 2 LUNs (ZFS zraid1 that is equivalent to RAID5 on each)

Fritz Mueller on Qserv installation options

  • all nodes will be wired into the Kubernetes cluster
  • a production Qserv will get installed 
  • we will preserve an option for installing the "low-tech" Qserv on the same hardware, which will require some form of partitioning the cluster between the production and testing activities
(tick)Development machine(s) for building and testing (the integration test) of Qser

Options:

  • (quick solution) a VM at IDF
  • (longer term) former NCSA machines to be installed at SLAC
  • using GitHub Actions to build containers
(minus) (postponed since Fabrice wasn't here)qserv-operator and qserv-ingest

Fabrice Jammes any news on progress in developing the fast ParquetCSV translator?

Action items

  •