Date

Attendees

Goals

  • Please register topics below

Discussion items

TimeItemWhoNotes

Project news

Progress from the previous meeting Database meeting 2022-03-09

Grand Unified Repo:

  • PR for merging qserv_testdata is still on the review for DM-33618 - Getting issue details... STATUS

Database init:

  • TBC

Subchunk sizes and overlap:

  • the problem is pertinent to the high-density catalogs
  • a goal is to decrease the number of rows in each subchunk to improve the of cross-joins in the N-N queries
  • do the query profiling first on the existing catalog (Fritz Mueller )
  • Igor Gaponenko has proposed to use an existing Data Exportation service of the Replication/Ingest system to get a subset of chunks from the existing kpm50  catalog where the problem is seen and reingest these data into the same 
  • Fritz Mueller will further investigate it and schedule it based on existing priorities.

Optimizations in processing results of the N-N queries

The context:


Update on worker load imbalance problemFritz Mueller 

The context was set in the previous meeting (see the link Database meeting 2022-03-09):

  • seems to be XRootD version dependant (Andy Hanushevsky 's help is needed here)
  • Andy Hanushevsky still needs to see the redirector's logs from the redirector and from one of the workers to see what's going on. John Gates would do this.

Andy Hanushevsky:

  • affinity works fine before an overload happens. After that XROOTD begins shifting chunk requests to further workers. This explains the linear behavior.

Resolution:


IDF worker crash this morningFritz Mueller 

Context:

How do we investigate this?

  • Andy Hanushevsky inspect the log files to see what service has the wrong address
  • Andy Hanushevsky 's theory is that we may have some "rogue" service in Qserv using the wrong IP address

Possible short-term solutions:

  • coordinate GKE upgrades with complete restarts of Qserv 

There is a (potentially?) related issue exhibiting itself in the worker logs as follows:

lsst.qserv.wdb.ChunkResource WARN: memLockStatus unexpected results, assuming LOCKED_OTHER. err=Error 0: Expecting one row, found no rows
lsst.qserv.wdb.ChunkResource WARN: Memory tables were not released cleanly! LockStatus=1

Further investigation shows that these harmless messages are posted by:

  • wdb/SQLBackend

Refactoring qserv-ingest

The work on modifying the workflow to begin using ASYNC ingest service is still in progress.

The SYNC mode worked successfully for ingesting 50 TB  catalog


Refactoring qserv-operator 

Context:

Fabrice Jammes is still working on the Operator to integrate the change.

Action items

  •