Date

Attendees

Notes from the previous meeting

Discussion items

DiscussedItemNotes
(tick)Project news

Fritz Mueller:

  • Getting ready for JSRs (in Tuscon after PCW)
  • Preparing a small presentation
  • Fabrice is moving on from Qserv as of December 2023
    • we need to discuss the transition plan (code, documentation, procedures, etc.)
    • Fabrice will be still available to help the team if needed

Colin Slater:

Vacations, etc.:

  • Igor will be gone next week
  • Andy S is camping in Sierra through the end of the week
(tick)Status of Qserv at IDF

Fritz Mueller:

  • a new production release was built 
    • 2023.7.1-rc3 
    • the release doesn't include the file-based result delivery protocol
    • deployed on qserv-int 
  • case sensitivity in queries involving the director index (AKA "object identifier")
    • hit this problem before with both the director index and the ref-match tables
    • we need to decide how to address this problem
    • Fritz Mueller will make a plan next week
  • Performance issues with the query history
    • Long period during the boot-up when Qser is unable to process queries. This could be several minutes.
    • A source of the problem is the full-text index (a few or many GB) that needs to be processed by MariaDB
    • Fritz Mueller emptied the Query history and archived it
    • Fritz Mueller proposed:
      • In production, we need to implement some kind of "rolling cleanup" to archive and empty the history
      • Do we really need the full-text indexes?
        • Igor Gaponenko:
          • proposed to disable the index and remove the relevant query search option from the Dashboard due to its limited use and slow performance
          • DM-40126 - Getting issue details... STATUS
    • Colin Slater :
      • LSST doesn't have a formal requirement to keep the query history in Qserv
    • Fritz Mueller:
      • Allow seeing Qserv load (the number of queries, etc.) on Graphana
(tick)Status of Qserv at USDF

Igor Gaponenko:

  • Qserv is up and fully functioning
  • The prod the instance needs to be upgraded to the latest Qserv release
    • 2023-07-19: updated to 2023.7.1-rc3
  • "Mobu" needs to be restarted
    • 2023-07-19: verified that "Mobu" is running
  • Fritz Mueller:
(tick)Status of DP0.3

Fritz Mueller:

  • The new input files are here
  • Colin Slater made several adjustments to the DDL (schema)
  • Colin Slater PCW will be the target release time for the database
    • Fritz Mueller realistically, it's going to be worked on this or next week
(minus)

"New" Qserv


Igor Gaponenko to report on the status of the development, and the next steps:

Query cancellation:

  • Igor Gaponenko:
    • Neither of those (above-mentioned) PRs do solve problems with the worker-side query cancellation
    • The second PR adds an additional trigger
  • Fritz Mueller:
    • it works well on Czar 
    • query cancellation on workers is a problem
    • worker-side MySQL query cancellation isn't presently working
  • John Gates:
    • We may need to implement result processing prioritization based on the amount of data in the result sets (for the scan queries only!)
    • One option would be to delay pulling results until 10% of the chunk results are ready at workers (the file-based protocol allows getting the notifications on those w/o pulling actual data from workers)
    • Igor Gaponenko this could be further optimized based on the amount of data reported in the first batches of worker responses
    • Fritz Mueller :
      • There is an interesting and confusing effect reported by users - in some cases, the highly-constrained queries may take a minute or 40 minutes
      • I would like that to be solved if possible.
      • John Gates:
        • unfortunately, this is unavoidable in Qserv if such a query goes into the scan queue
      • Fritz Mueller:
        • query classification (at workers) is based on the chunk "difficulty" (what tables it involves, or if the query involves JOIN , etc.)
        • there is some scheduling strategy that we could exploit/improve here
        • we need to discuss it further to come up with a consensus
    • Colin Slater:
      • is there any difference between "medium" or "fast" queues?
      • John Gates:
        • the queues are the same
        • a decision is made on the query complexity
        • we try to load a table in memory once and benefit from that when processing queries touching that chunks
      • Fritz Mueller actually, the question here is if a query with a small number of chunks should go into some specific queue.
  • Next steps:
    • Colin Slater looks at Google monitoring pages to see if there are any correlations between CPU vs Disk I/O vs memory usage
    • Igor Gaponenko will run various tests au USDF w/ and w/o locking tables in memory to see the effect of these optimizations

Next steps to test this version after merging into the main branch:

  • Deploy in -prod at USDF for background testing
  • deploy in -int at IDF?
    • Changes in qserv-operator are needed to support XROOTD file resources and worker-side temporary folders for storing intermediate result files
    • Fritz Mueller (action items):
      • will have a look at the e2e test
      • we will build a separate Qser v release 
    • Fritz Mueller:
      • we need to merge the operator into the Qserv repository
      • this simplifies tests
      • at some point, we will be building different containers

Colin Slater:

  • We could stream user queries to be run in parallel in two versions of Qserv 
  • Igor Gaponenko : we could "steal" the query history from IDF and "replay" it at USDF

Next code development steps:

(tick)Ingesting user-generated data products into Qserv

Fritz Mueller: any word from the RSP team and GPDF on the requirements for the API?

  • We may have a chance to further discuss it at PCW

Igor Gaponenko: browsed the source code tree of the TAP service 

Action items

  •