Date

Attendees

Notes from the previous meeting

Discussion items

DiscussedItemNotes
(tick)Project news

Fritz Mueller :

  • No DMLT this week due to Operations Review read Team (friendly version of the review)
  • Colin is on that review
  • (Operations) Monthly SITCOM General Assembly meeting. It's a good source of up-to-date info on a situation at the Summit.
  • Ongoing effort on preparing the Operations Rehearsal 3, which includes 24-hour simulation of the operations activity using ComCam DAQ, including ingest, copying to USDF, reducing data, and prompt processing. It will happen in the first week of April. Fritz helping out with the Campaign Management in this context.
  • The star-tracker (small) cameras are installed outside the telescope assembly. These are used for taking movies. Good progress here.
  • Good progress on the refrigeration pathfinder. 
  • On the Jira migration to the cloud:
    • SLACK support channel #rubin-to-cloud
    • For the frozen tickets, allegedly "someone" is working on setting up the redirector service. No further details yet.
    • Andy Salnikov: a convenient SLAC is plugging for Jira integration. Look for the application "Jira Cloud" (go to Applications, enter /Jira, and configure the integration).
  • There is a group for interviewing Richard's replacement.
    • Fritz is not involved in the process
    • Fritz (and the management subtree) transferred from Richard to Phil

Colin Slater: no major news from the Red Team review.

(tick) USDF

Igor Gaponenko :

  • The Join Data Data Facilities meeting didn't happen this Monday due to the conflict of schedule for key people.
  • (informally) talked to Yee regarding the status of the 28 Qserv nodes. He talked to Omar. Unfortunately, we have no good news here and no ETA either.
  • Fritz Mueller :
    • The target date for the extended cluster is Summer. We need to have the cluster well before that to evaluate the setup, expand Qserv, and address issues (if any).
    • The delays may be caused by a bunch of conflicts between Stanford and SLAC IT teams on resource allocations, rack placement, etc.
    • There is a new SLAC IT person who replaced Amedeo.

Andy Salnikov :

  • Not much news on Cassandra this week.
  • Worked on the replication script to copy data from Cassandra to the PostgreSQL-based PPDB.
  • In discussions with the APDB developers on various topics, including deployment, etc.
  • Fritz Mueller any word if Cassandra is going to be used during the operations rehearsal
(tick)Current status of Qserv and Qserv builds

Igor Gaponenko 

  • The status of the Qserv cluster can be seen in Qserv Deployments.
  • Both Qserv clusters at IDF were upgraded to the latest production release 2024.3.1-rc2.
  • Among other improvements and bug fixes, the release got a fix for:
    • DM-42638 - Getting issue details... STATUS
    • Inspected both clusters for the remaining "garbage" (abandoned message and other temporary tables)
    • found over 3.5 million such tables in each cluster
    • started the manual cleanup
      • UPDATED 2024-03-13 11:30 AM: 1.4 million tables have been removed from each cluster
      • it's supposed to be finished next week
  • An interesting observation was made on the lifetime of the worker pods in the -int cluster. Workers have different lifetimes. The restart counters for pods were still all 0.
    • Fritz Mueller :
      • the effect may be caused by the forced GKE upgrade
      • the automated upgrade was disabled in Qserv at some point last year
      • will investigate this
  • A problem observed earlier with intermittent issues in GHA CI has been investigated using:
  • and fixed in:
    • DM-43274 - Getting issue details... STATUS
    • The bug was introduced during the migration of the Qserv control plane protocol to HTTP
    • I see no urgency in pushing the fix into production Qserv. It mostly affects the catalog ingest operations and to a minor extent Qserv monitoring.
    • However, building another release 2024.3.1-rc3 is not a bad idea.
      • Fritz Mueller is not sure if we should do this now or wait before John Gates finishes working on addressing the "dark" mode review. Realistically this may happen on a scale of new days. Friday may be a good day to build the release.
      • Igor Gaponenko : we  
  • Fritz Mueller suggested extending Qserv deployment to submit a "probe" query during Qserv startup.
  • Fritz Mueller has made more work on submodules:
    • updated submodules
    • updated the configuration
(tick)Merging qserv-operator into qserv source tree and changing container builds 

Fritz Mueller: on the planned/ongoing work:

  • start working on consolidating Python code within the Qserv source tree into a single sub-tree
  • This pays a way to the next projects with Igor Gaponenko on wrappers, entry points
  • started looking at bring qserv-operator  into the source code tree. This builds a  good foundation for subsequent work on the entry points, configuration management, and deployment models
(tick)Addressing an issue with the "dark" queries

The context:

John Gates :

  • working on improving the worker configuration
(tick)HTTP-based Qserv frontend

Igor Gaponenko on the ongoing work on extending the integration test:

Igor Gaponenko on ingesting the user-generated data products:

  • Started designing the REST services
  • The services will be added to the HTTP-based Qserv frontend
  • Fritz Mueller :
    • we still don't have a resolution (feedback) on the format of the payload
    • we could decompose the REST API to combine submitting queries with uploading into the 
(tick)New dispatch

Fritz Mueller:

  • this is the project John Gates is going to work on after finishing the current work on the "dark" queries
  • the work needs to be done on a separate feature branch like the one that was set up by Fritz and Nate when working on the lite containers and the integration tests
  • Fritz Mueller is going to help John Gates with Git

Igor Gaponenko was curious about the development plan

Action items

  •