Date

Attendees

Notes from the previous meeting

Discussion items

DiscussedItemNotes
(tick)Project news, etc.

Fritz Mueller on the project news:

  • Vacuum lime on the camera pushes the camera delivery to December (realistically to January 2024)
  • Because of that ComCam is def going to be on (to be announced in the next few days)
  • See more on the project milestones: https://www.lsst.org/about/project-status

Fritz Mueller on the last round of JSRs:

  • In general it was it was "bumpy". However, DM did well with only a few recommendations.

Igor Gaponenko:

Other news:

  • Frossie and Colin Slater are visiting SLAC at the end (last week) of the month for a few days to discuss Qserv, etc.
  • We need to prepare topics to discuss (issues, resources, ongoing developments, next steps, schedule, etc.)
  • Fritz Mueller the history of that matter:
    • originally Qserv wasn't supposed to have external backups. 
    • An intent was to rely on other sites
(tick)Next Qserv release

 Changes:

  • improvements in Qserv monitoring and Web Dashboard
  • added support for the file-based result delivery
  • query cancellation improvements

Fritz Mueller :

  • started looking at problems GHA/CI mentioned below
  • release build is blocked by that

Fabrice Jammes:

  • will be working on extending the operator to support file-based results
  • Fritz Mueller will build a container based on the current state of the Qserv main  branch and hand it over to Fabrice Jammes 
  • may need to make changes to the Qserv code
(tick)Changes in the build/run-time platform

Igor Gaponenko Problems with the hard-wired UID=1000/GID=1000 for Dockerfile user qserv.

Fritz Mueller the problem is with the external user that runs the Qserv build container in GHA/CI

Context:

  • GHA/CI is broken
  • do we really need such a user at all?
  • if we do then we need a workaround

Fritz Mueller:

  • There is a collision on the GID due to a new group introduced in the newest update of the base Docker image (AlmaLinux 8)
  • Qserv build containers are customized for specific users
  • It's not clear what to do here. Looking at various options. More investigation is needed for an actual source of the problem.

Migrating to AlmaLimux 9:

  • LSST Stack seems to be moving to that version (it's still at a level of RFC)
  • Should Qserv be upgraded as well to keep up with the latest software stack (including C++20)?
  • Any reasons not to do so?
  • Fritz Mueller :
    • We upgrade packages a few times a year
    • Upgrading the OS version may require a lot more work
    • Lua is the main concern
    • It's a few days of my work
    • Though, we should do it.
  • Andy Hanushevsky :
    • CentOS 9 is giving troubles for XROOTD
    • XROOTD works in AlmaLimux 9. However, this platform is not officially supported. It's still in the dev stage. (warning)
    • Suggested to wait before the platform is released to a broader community.
(tick)"New" Qserv

Igor Gaponenko :

  • ongoing work on Czar monitoring
  • perhaps a little demo?
  • using a proprietary plotting library for visualizing time series: https://shop.highcharts.com/
  • cost wise, it's $160/year or $368 for the "perpetual" (not constrained by the time or version)

Fabrice Jammes :

  • working on adding support for the file-based result delivery protocol in qserv-operator 
(tick)USDF

Andy Salnikov on the status of Cassandra

  • Random crashes of the containers due to OOM despite the fact that there were plenty of free memory
    • Also, the commit log may be corrupted as a result of failures to recover the databases
  • It turned out that the VM size of the processes exceeded 128 TB  which is the Linux limit for processes
  • The application does many memory maps of huge files which results in exceeding the above-specified limit
  • Got a solution factor contacting Cassandra experts: 
    • There is a config option to disable the memory mapping (found the one accidentally in one of the related discussions)
    • Tried it and got the service running
  • The bottom line: it's a very complex software that requires a dedicated expert to oversee the services
    • Fritz Mueller agreed with that and proposed to discuss this with the upper-line management

Igor Gaponenko:

  • issues with the configuration of the core files. Contacted Yee.

Action items

  •