Skip to end of metadata
Go to start of metadata

Date

Attendees

Goals

  • See a list of topics below

Discussion items

TimeItemWhoNotes
Project news

Fritz Mueller Remote work location/permission forms are required to be filled out before the end of this week (Friday).

Fritz Mueller A project management meeting is going on in Tuscon right now. Concerns on the multi-site processing.

Presentations made at the last week's 2nd Data Facilities Planning Workshop - 2022-01-19/20 are linked to that page. Fritz Mueller's notes on the Workshop:

  • NCSA decommissioning LSST-allocated hardware end of October
  • ComCAm is the main focus. This Summer we expect enough k8s nodes at USDF and IDF
  • Qserv wasn't specifically mentioned.

Fritz Mueller Replacement for NCSA resources (for Qserv): mentioned the problem to Will who agreed that IDF could be used for testing Qserv at scale if needed after we lose NCSA. Qserv doesn't worry LSST management because is primarily required for the final stages of DRP. Though, the team still needs sufficient resources to test scalability.

Igor Gaponenko we need early access to Kubernetes infrastructure at USDF as early as possible to ensure the infrastructure quality, stability and performance are adequate for Qserv.

  • Fritz Mueller SLAC experience with this technology is rather limited realistically speaking just one person - Yee). NCSA was managing Kubernetes manually, and SLAC is going to do so. It's expensive to pay for external support (Google?) for Kubernetes infrastructure.

Andy Hanushevsky discussed using polling Rucio. Another interesting idea was to use Kafka in Rucio.

Fritz Mueller mentioned that Andy Salnikov's work on APDB would be affected by the OGA's "rack" at USDF, that's supposed to be commissioned before this Summer (for ComCam). No hardware has been ordered yet. This creates uncertainty in planning.


Getting ready for DP0.2

Fritz Mueller commented on plans for loading a slice of DP0.2 into IDF. We're past the milestone by 1 month. Reasons: delayed migration of the Kubernetes-based deployments to the new (*lite*) containers. Still see some bumps in that direction. Options:

  • wait for another week before the new version of the operator and ingest will be ready,
  • or use existing Qservs at IDF and existing ingest tools for that

Fritz Mueller would like to retain existing data during Qserv upgrade and avoid reloading all (including existing) data from scratch as proposed by Fabrice Jammes.

  • MariaDB and schema upgrade problems need to be addressed here.
  • We need to make a progress in this (upgrade) path.

Igor Gaponenko DP0.2 comes with an extended schema (compared with DP0.1) which may need to be tested. Do we have the schema? Do we have sample data? Shall we test load a slice of DP0.2 into NCSA first?

  • Fritz Mueller Both are available. Data exist as Parquet files that need to be translated into the CSV/TSV format. Data may be at a shared filesystem at NCSA. We need to contact Hsin-Fang Chiang to figure out where to find it.
  • Fritz Mueller loading into the small cluster at NCSA is a good idea. We would have to do it anyway.

Problems with Python 3.10, SQLAlchemy, and MariaDB 10.5 in the Kubernetes-based ingest workflow.

Discussed Fritz Mueller's idea of putting Qserv and the operator into a single Git package.

Fabrice Jammes Qserv ingest is broken after migrating to the new version of MariaDB. A problem exists with the validation stage. The loading itself works.

  • there is a hack that might be employed to work around the problem, such as removing SQLAlchemy and replacing it with a simple tool.

Igor Gaponenko can it be all written in GoLang instead of Python?

  • Fritz Mueller thinks that GoLang should not be used for any projects beyond the operator.

Igor Gaponenko can we have a single base container for everything in Qserv, including Kubernetes-based ingest?

Igor Gaponenko can we turn Fabrice Jammes's ingest framework into a ready-to-use tool that would be configurable by users to ingest their data? It would be easier for the users to do declarative ingests rather than writing customized code on top of the framework.

Fritz Mueller: we're still having troubles with coordinating releases between Qserv and the operator. There are rough edges in an area of configuring Qserv from the operator. We need a solution here:

  • submodules as proposed by Fabrice Jammes is not the most optimal option here (though, it's possible to use them)
  • we might set up GHA to do cross-repo CI. Though, it's a complicated problem as it requires configuration efforts

Fritz Mueller another possibility would be to put everything (Qserv and the Operator code) into one repo. The team should think about possible complications of that.

Igor Gaponenko what about dependencies? The operator may have dependencies that might interfere with the ones of the Qserv container.

Unknown User (npease) is concerned regarding the built time of the operator.

Igor Gaponenko what does the "great unification" would mean from the source tree and build tools (CMake) perspective?

* Unknown User (npease) combining the source trees is not a problem. Different tools would be still used for each. 

Igor Gaponenko integrating the ingest framework into the single Git package will create problems for potential users who might need to customize it for specific ingests. A solution would be to turn the ingest framework into a customizable ready-to-use tool instead of being the framework.


Progress on topics discussed at the previous meeting Database Meeting 2022-01-19

Igor Gaponenko published an extended version of the proposal for Possible deployment and management scenarios for Qserv. Started working on improving the inner communication protocol of the Replication/Ingest system to make it "cloud-friendly" DM-33376 - Getting issue details... STATUS .

John Gates merged DM-31537 - Getting issue details... STATUS into the main branch.

Igor Gaponenko Any news on the new version of XROOTD that was mentioned at the last meeting?

  • Andy Hanushevsky improved the version of the redirector has increased parallelism by an order of magnitude (depending on the number of cores available on a machine). It will be available in release 5.4.1. (to address Igor Gaponenko's concerns) No backward-incompatible changes should be expected in the configuration system. Still need to check if the affinity-specific improvements (developed for Qserv and used by John Gates) would still work in the new version.

Documenting Qserv


Fabrice Jammes reported troubles with the LSST documentation portal & tools.

Fritz Mueller proposed to move the documentation tools into the Qserv build container rather than a separate documentation container as it's implemented now.

Fritz Mueller Move version-specific documentation into Qserv GIt package and keep using Confluence for external (catalog) references only. LSST Doc has many benefits - it automatically links to the Git release.

Igor Gaponenko expressed concerns regarding the organizational structure and visual styles of the Sphynx-generated documentation adopted by LSST.

  • Fabrice Jammes suggested looking at other tools that are used for generating Kubernetes documentation. The latter looks nicer and more convenient (useful and convenient navigational structure) compared to what LSST is relying upon.

Action items

  • Igor Gaponenko should fill out the remote work forms before the end of the week. 
  • John Gates should fill out the remote work forms before the end of the week.
  • Fritz Mueller expects the team to provide feedback on an idea to unify all Qserv-related code (Qserv itself, the operator, and the ingest framework) into a single Git package.
  • Fritz Mueller will make the PR on adding logging to qhttp DM-33088 - Getting issue details... STATUS
  • Fritz Mueller will find a location of the DP0.2 schema and data for Igor Gaponenko