View Source

Date

05 Jan 2022

Attendees

Igor Gaponenko Fritz Mueller Andy Salnikov Fabrice Jammes Unknown User (npease) John Gates Kian-Tat Lim

Goals

Discuss Fritz Mueller's proposal for configuring Qserv containers.

Discussion items

Item	Who	Notes
A proposal for configuring Qserv containers	Fritz Mueller	Fritz Mueller presented a proposal: DRAFT: configuring Qserv containers Kian-Tat Lim prepared an updated version of the proposal based on the subsequent discussion among the team members: DISCUSSED: configuring Qserv containers Ongoing tickets that are relevant in this context:
Configuring and adding workers to Qserv cluster	Fabrice Jammes Igor Gaponenko	Fabrice Jammes raised the topic of updating a configuration of the Replication/Ingest system at run time. This is needed for two reasons: for registering workers at the startup time of Qserv for scaling up an existing cluster Igor Gaponenko reported that there is an ongoing effort to improve the situation here. The first step is to migrate worker services (specifically cmsd and the replication system's worker) to self-configure themselves (learning their identities) from the unique (UUID-generated) dataset identifiers stored in the corresponding Qserv worker databases. For further details and the current status of this development see: The second step will be to make changes within the Replication system's communication network to allow workers to log into a (yet to be implemented) redirector service. This will reverse dependencies within the system and eliminate a need for the explicit configuration of the workers. A preliminary plan for this development was discussed between Igor Gaponenko , Fritz Mueller ad Andy Salnikov before the Winter break. This project is still at an early stage. The actual work on it will start after Fabrice Jammes will finish migrating Qserv to the lite containers and their entry points.
Schema initialization and migration	Fritz Mueller Fabrice Jammes	The topic was just briefly mentioned in the context of the Qserv configuration discussion as there is an overlap between both. It was decided to postpone the discussion till the next meeting.
Status report on testing	Igor Gaponenko John Gates	Lockups are seen in the latest version of the branch when testing mixed query loads in the large Qserv cluster at NCSA. Two types of queries are launched simultaneously in this round of tests: one or two unconditional queries like `SELECT * FROM database.table LIMIT 1` 100 or 200 of the near neighbor queries, each covering from 1 to 7 chunks The lockup is happening shortly (a few minutes) after launching the queries. The problem is reproducible. Details were posted in the last comment to the ticket: The direct link to the most relevant comment: https://jira.lsstcorp.org/browse/DM-31537?focusedCommentId=443619&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-443619

Action items

John Gates will work with Igor Gaponenko (if needed) to investigate the lockups.
Fritz Mueller will lead a discussion for initializing and upgrading Qserv schemas at the next meeting. This will be preceded by a discussion among interested members of the group at the team's Slack channel.
Igor Gaponenko will be looking at migrating the configuration system of the Replication/Ingest system from the database tables to a more conventional technique.
Fabrice Jammes will work on finalizing migrating the operator-based Qserv deployment tools to the lite containers and the new configuration model.
Unknown User (npease) will finish improving the parameter handling in the entry points as per Fritz Mueller's proposal.