Notes from the previous meeting (2 weeks ago, before PCW2022):

Discussion items

(tick)Project news

LSST has a new system commissioning manager:

DM JTM in Chile will happen on March 13, 2023.

Join Status reviews will be happening on September 14, and 15 at SLAC.

(tick) NCSA → USDF, IDF team

Fritz Mueller:

  • NCSA shutdown has been extended through the end of this week to address a problem with some files lost during file migration to SLAC
  • some issues with storage at SLAC

Igor Gaponenko asked a question regarding the possibility of modernizing bbcp. Andy Hanushevsky responded that the new tool should get a better architecture to allow

  • better memory management
  • compression & encryption 

 Igor Gaponenko: Qserv setup on the temporary (loaner) hardware:

  • 6 equally configured nodes 
  • 32 cores/64 threads: AMD EPYC 7543
  • 256 GB of RAM
  • 12 TB of NVMe (4 disks configured as RAID0)
  • A snapshot of the large6  Qserv was made at NCSA in May 2022. The snapshot will be used for setting up Qserv
  • The preliminary plan is to install Qserv using the "igor" mode with 5 workers only (one node will be running Qserv master services (`czar`, czar 's database, XROOTD redirector, Replication Controller, Replication Worker Registry, and the Replication database). We may afford to lose one worker snapshot (out of 6 available) since the source Qserv was being run at the replication_level=2 .  

Fritz Mueller on the extended use of this instance:

  • this Qserv may be used for serving DP02 and other catalogs before we'll get the permanent hardware (there is the ongoing procurement process for a batch of the first 15 nodes.

Igor Gaponenko: developing Qserv on the VM in IDF

  • Spun up the VM of 8 vCPUs, 64 GB of RAM, and 80 GB of the "balanced" disk (HDD plus memory-based filesystem)
  • The VM runs CentOS7.
  • Setup the environment (Docker, Docker compose, and a bunch of the missing Python3 modules)
  • XRootD doesn't work on CentOS9. All services crash
    • Igor Gaponenko will figure out how to retain the core files in the docker-compose based environment and work with Andy Hanushevsky on the crash.
    • Andy Hanushevsky has suggested upgrading XRootD  to a version 5.5 that supports CentOS9 . This version will be available end of this month (August). Though 5.4.3 should also work.

Fritz Mueller on ForcedSourceOnDiaObject :

Fritz Mueller on the new version of the truth tables:

  • the tables are deployed and tested in qserv-int 

  • we should wait before Colin Slater finishes Q&A-ing the tables
  • after that Igor Gaponenko will deploy the tables in qserv-prod ... unless there will be any issues with the tables.

There are 3 still issues with Qserv that need to be addressed and deployed in IDF:

  • Case sensitivity (the directory columns of the RefMatch the table should be treated by Qserv as case insensitive)
  • The connection timeout issue of 8 hours still exists. mysql-proxy  is suspected. Or maybe in the front-end MariaDB
  • Cancel queries when the client connection goes off (due to the timeout?)

Andy Salnikov on the disconnects:

  • The Control-C  is intercepted by mysql  and the latter is opening a separate connection to KILL <id>  for the relevant.
  • we may need to fix mysql-proxy to detect (client disconnects) and cancel the queries. The proxy may allow specifying user callbacks for handling such disconnects.

Fritz Mueller: eventually we need to replace this aging proxy with a better one. This is one of the high-priority architectural improvements for the "new" Qserv. At some point, we will have an in-person dedicated discussion on the subject between Fritz Mueller, Andy Salnikov, John Gates, and Igor Gaponenko.

