Date

Attendees

Discussion items

TimeItemWhoNotes

Project news
  • (budget) moving from the construction phase to the operation, Qserv goals are to be defined differently (from building to enhancing)
  • (DM) Frossie will advocate cloud-ness to NSF Large Facilities Office

Progress on topics discussed at the previous meeting Database Meeting 2022-04-13team

Igor Gaponenko:

  • Tables Visit  and CcdVisit  ingested into qserv-int  at IDF

Fritz Mueller: TAP schema on the INT cluster needs to be updated

Unknown User (npease): what's the status of DM-32600 - Getting issue details... STATUS ?

Fabrice Jammes : on the qserv-operator upgrade

  • the Replication system's worker registration service has been integrated and tested. Still waiting for the review on PR for: DM-34379 - Getting issue details... STATUS
  • moving the Replication Controller to the czar node has solved the OOM (killing) problem

Fabrice Jammes : on qserv_ingest

  • ASYNC mode works well for ingesting the 40 TB size catalog
  • added support for ingesting regular tables

Status of the parquet-to-csv  translatorteam

Igor Gaponenko: poor performance of the parquet-to-csv conversion application has been observed. Tested it on the 7 GB Parquet file. The conversion produced a CSV file of 14 GB  in size. The write speed was 7 MB/s  on average. The CPU utilization during the conversion was 100%  (exactly 1 core, which is not a surprise considering this is a sequential application written in Python). What's actually worse is the memory consumption of the application:

  • VM: 40 GB
  • Physical memory:  8 GB

...which means it's impossible to utilize more than (say) 2 or 3 cores on the large memory machine (128 GB). In reality, we might not have such ("large memory") machines at GKE.

This is an illustration of what it means in the context of translating the Parquet files of the DP02 catalog where it would take 80 hours to translate 1 TB of the input data (or to make 2 TB of the output data) using 1 core:

2 TB / (14 GB / 7 MB/s) = 285714 secons = 80 hours

My impression is that this application may need to be rewritten in C++ using this API https://arrow.apache.org/docs/cpp/parquet.html ... unless someone will find a way to dramatically speed up the Python code.

Igor Gaponenko : the second issue is that the translator can't find the values of the objectId  column. Details on this subject can be found in the TAX team forum https://lsstc.slack.com/archives/G2JPZ3GC8/p1650422856573549

There are two options to frame Parquet files:

  • Fritz Mueller data frames (we need to coordinate this with the Pipeline team)
  • Kian-Tat Lim splitting a file into a dataset (many files)

Fritz Mueller: it would be convenient to pass the Felis file to the translator

Andy Hanushevsky: will evaluate this option. Fritz Mueller will provide a sample file to Andy Hanushevsky 

On the Parquet "index":

  • Andy Hanushevsky : will add an option to enable the option on demand
  • Igor Gaponenko: will work with Hsing-Fang on updating the Argo-based workflow to 

Extended geometry support in Qserv

Context:

  • there is an ongoing discussion on this subject at https://lsstc.slack.com/archives/C8EEUGDSA/p1650411741492219
  • needed in the context of delivering image service support (OPS TAP service). Need to record regions for exposures. It's desired that Qserv supported queries selecting objects in a given region. Or queries allowing overlaps of the regions.
  • Qserv supports binary data columns (polygons) that may be used for that
  • we need (at least) to replicate OPS TAP (meta-)data in Qserv. At what table? How many rows are we talking about here?
  • Fritz Mueller: the first option would be to put these into the fully-replicated table. This should work for the DP02. We will look for partitioning these data for the larger-scale catalogs. The overlap radius could be a problem here as it's much larger for image metadata. That would cause compatibility problem with other tables and won't let Qserv JOIN rows of these tables.
  • Kian-Tat Lim: we may also investigate the possibility of using the HTM-based index as an alternative.
  • Fritz Mueller: and finally, there is an option not to use Qserv and to merge at the TAP level(?) 
  • Andy Salnikov: was tasked to write the butler-based converter for extracting and exporting OPS TAP (meta-)data

The bottom line:

  • the DAX team will investigate options
  • a solution will depend on the number of rows

Improving error reporting and query processing statistics at czar.

team

Context:

  • DM-34464 - Getting issue details... STATUS
  • collecting ideas

John Gates will take over this ticket


Using qserv-ingest  to ingest the Object table of DP02

Igor Gaponenko: trying to understand how to use qserv-ingest for ingesting partitioned products of pre-ingest steps on IDF into Qserv in IDF.

Andy Hanushevsky : ATLAS has S3-to-HTTP gateway that could be used here

Kian-Tat Lim : GCS has a service allowing to read objects via HTTP. This can be configured in the Cloud Console permissions for a bucket and by selecting "Allow public access".

Igor Gaponenko : the Ingest system could be improved to read S3 buckets directly



Action items

  •