Date

Attendees

Goals

Discussion items

TimeItemWhoNotes

A proposal for configuring Qserv containers

Ongoing tickets that are relevant in this context:


Configuring and adding workers to Qserv cluster

 Fabrice Jammes raised the topic of updating a configuration of the Replication/Ingest system at run time. This is needed for two reasons:

  • for registering workers at the startup time of Qserv
  • for scaling up an existing cluster

Igor Gaponenko reported that there is an ongoing effort to improve the situation here. The first step is to migrate worker services (specifically cmsd and the replication system's worker) to self-configure themselves (learning their identities) from the unique (UUID-generated) dataset identifiers stored in the corresponding Qserv worker databases. For further details and the current status of this development see:

The second step will be to make changes within the Replication system's communication network to allow workers to log into a (yet to be implemented) redirector service. This will reverse dependencies within the system and eliminate a need for the explicit configuration of the workers. A preliminary plan for this development was discussed between Igor Gaponenko , Fritz Mueller ad Andy Salnikov before the Winter break. This project is still at an early stage. The actual work on it will start after Fabrice Jammes will finish migrating Qserv to the lite containers and their entry points.


Schema initialization and migrationThe topic was just briefly mentioned in the context of the Qserv configuration discussion as there is an overlap between both. It was decided to postpone the discussion till the next meeting.

Status report on testing

Lockups are seen in the latest version of the branch when testing mixed query loads in the large  Qserv cluster at NCSA. Two types of queries are launched simultaneously in this round of tests:

  • one or two unconditional queries like SELECT * FROM database.table LIMIT 1 
  • 100 or 200 of the near neighbor queries, each covering from 1 to 7 chunks

 The lockup is happening shortly (a few minutes) after launching the queries. The problem is reproducible.

Details were posted in the last comment to the ticket:

The direct link to the most relevant comment: https://jira.lsstcorp.org/browse/DM-31537?focusedCommentId=443619&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-443619

Action items