Page History

...

03 Aug 2022

Attendees

Igor Gaponenko Colin Slater Fabrice Jammes Andy Salnikov John Gates Fritz Mueller

Last week's meeting notes:

...

Time Item Who Notes

Project news

Fritz Mueller

DM all hands in Chile has been moved to March 2023

PCW2022 is comming

JSRs are coming after PCW2022. People are going to be preoccupied with that.

News from ORA: recognizing/rewarding people for work on DP01 (not DP02). E-mails with invitations have been sent to the relevant folks. This is scheduled for August 16th.

Status of DP02

Colin Slater has the complete set of Parquet files for ForcedSourceOnDiaObject.

Igor Gaponenko will spend the next 24+ hours preprocessing the Parquet.

Fritz Mueller suggested ingesting both qserv-int And qserv-prod in parallel since scientists may be interested in seeing this table in Qserv before PCW2022.

Fritz Mueller On RefMatch tables:

the problem has been identified in the case sensitivity of the RelatonalGraph implementation.
this will be fixed and deployed soon.
still waiting for:
Jira
server JIRA
serverId 9da94fb6-5771-303d-a785-1b6c5ab0f2d2
key DM-35578
In the meantime, going to modify CSS manually

Fabrice Jammes will need to be notified by Igor Gaponenko on where to locate the new version of ForcedSourceOnDiaObject and truth tables (as RefMatch) and the relevant instructions on the schema and CSS configurations.

Case sensitivity in Qserv

team

Context:

Christine ran into some problems with Qserv in the past
Qserv is case-sensitive on database and table names
There is the JIRA ticket related to this:
- Jira
  server JIRA
  serverId 9da94fb6-5771-303d-a785-1b6c5ab0f2d2
  key DM-16709

Fritz Mueller thinks he knows how to implement the case-insensitive front-end for incoming queries

Igor Gaponenko noted that some user queries are still case-sensitive in a respect of the column names. This includes queries mentioning the primary key of the director table.

Load testing of Qserv

Fritz Mueller

Context reported by Fritz Mueller :

Issues with JDBC client library not canceling synchronous queries when the client disconnects from the TAP service.
We have to address these issues.
In the meantime, the testing needs to be postponed before the problems are understood and fixed.

Andy Salnikov: connection from a client to proxy results in another connection to MySQL

needs to be investigated.

Fritz Mueller: another issue was about disconnects from Qserv due to 8 hours timeouts have been observed. It turns out this is exactly the timeout set in czar's MySQL service.

Fritz Mueller: tried to increase that timeout. This didn't help

Andy Salnikov: it's possible this could be fixed by setting a proper interactive timeout or the wait timeout (wait_timeout) on the czar's MySQL server. Another idea is to check what mysql-proxy thinks about the timeout:

Code Block
For timeouts could be interesting to run SHOW SESSION VARIABLES LIKE 'wait_timeout' through the proxy.

Fritz Mueller tried to increase that timeout. This didn't help will further investigate this.

Fritz Mueller: there was some confusion about what queue was used for processing queries at workers. Discussed it with John Gates.What would be the next steps in investigating disconnects and timeouts?

News on qserv-ingest

Fabrice Jammes

Issues caused by the timeouts have been fixed.

The next step will be to ingest DP02 into qserv-dev using qserv-ingest.

Fritz Mueller would like to fix the LSST Logger configuration in qserv-operator.

Instabilities in the Kubernetes-based CI

team

Context:

there is a chance CI may fail to ingest catalogs because some of the workers aren't ready (have not reported "for duty" to the Replication Controller.
this creates a spectrum of problems

The readiness probe based on the kube control's ability to monitor pods has been in place for many months. The CI is blocked waiting before all reports as ready.

Fritz Mueller: formally this looks good. The problem is in the fidelity of the application's status. It may take more time for the application to stabilize itself.

Igor Gaponenko proposed two solutions:

Ask the Replication Controller which workers (how many of those) have connected. There is a REST service for that. Unfortunately, the service has a bug that needs to be fixed. See:
- Jira
  server JIRA
  serverId 9da94fb6-5771-303d-a785-1b6c5ab0f2d2
  key DM-35774
Or, add an option to the Controller to wait before a quorum is formed (the required number of workers had connected to the Controller).

Space shortcuts

Page tree

Versions Compared

Old Version 4

New Version Current

Key

Attendees

Last week's meeting notes:

Action items