Igor Gaponenko to report on the status of the development, and the next steps:
-
DM-38069
-
Getting issue details...
STATUS
-
DM-39819
-
Getting issue details...
STATUS
Query cancellation:
- Igor Gaponenko:
- Neither of those (above-mentioned) PRs do solve problems with the worker-side query cancellation
- The second PR adds an additional trigger
- Fritz Mueller:
- it works well on
Czar
- query cancellation on workers is a problem
- worker-side MySQL query cancellation isn't presently working
- John Gates:
- We may need to implement result processing prioritization based on the amount of data in the result sets (for the scan queries only!)
- One option would be to delay pulling results until 10% of the chunk results are ready at workers (the file-based protocol allows getting the notifications on those w/o pulling actual data from workers)
- Igor Gaponenko this could be further optimized based on the amount of data reported in the first batches of worker responses
- Fritz Mueller :
- There is an interesting and confusing effect reported by users - in some cases, the highly-constrained queries may take a minute or 40 minutes
- I would like that to be solved if possible.
- John Gates:
- unfortunately, this is unavoidable in Qserv if such a query goes into the scan queue
- Fritz Mueller:
- query classification (at workers) is based on the chunk "difficulty" (what tables it involves, or if the query involves
JOIN
, etc.) - there is some scheduling strategy that we could exploit/improve here
- we need to discuss it further to come up with a consensus
- Colin Slater:
- is there any difference between "medium" or "fast" queues?
- John Gates:
- the queues are the same
- a decision is made on the query complexity
- we try to load a table in memory once and benefit from that when processing queries touching that chunks
- Fritz Mueller actually, the question here is if a query with a small number of chunks should go into some specific queue.
- Next steps:
- Colin Slater looks at Google monitoring pages to see if there are any correlations between CPU vs Disk I/O vs memory usage
- Igor Gaponenko will run various tests au USDF w/ and w/o locking tables in memory to see the effect of these optimizations
Next steps to test this version after merging into the main
branch:
- Deploy in
-prod
at USDF for background testing - deploy in
-int
at IDF?- Changes in
qserv-operator
are needed to support XROOTD file resources and worker-side temporary folders for storing intermediate result files - Fritz Mueller (action items):
- will have a look at the
e2e
test - we will build a separate Qser v release
- Fritz Mueller:
- we need to merge the operator into the Qserv repository
- this simplifies tests
- at some point, we will be building different containers
Colin Slater:
- We could stream user queries to be run in parallel in two versions of Qserv
- Igor Gaponenko : we could "steal" the query history from IDF and "replay" it at USDF
Next code development steps:
-
DM-40003
-
Getting issue details...
STATUS