Notes from the previous meeting

Discussion items

(tick)Project news

Fritz Mueller:

  • MTA balancing and M1M3 testing at inclination are proceeding at the summit.
  • Auxtel run in progress next few days, providing stimulus for prompt-processing prototype, which actually processed images and deposited results into embargo butler last night!
  • Some various hair catching fire now re. UDSF shutdown (6/24-7/4).  See notes below re. DP03; the plan is also to use summit EFD and spare storage at the summit and catch up with USDF after it comes back online.
  • Fritz's upcoming travel (will post to group calendar when dates are finalized):
    • 1st week of June: Tucson
    • Mid-June: Vacation
    • Late-June: Chile for 2 weeks.

Igor Gaponenko noticed an invitation to register for the Rubin Operations Project and Community Workshop 2023 (PCW). Though, the announcement is mentioning a limit of 300 (on-site) participants. 

(tick)Status of DP03 

Fritz Mueller:

  • CST has a DP0.3 prep workshop scheduled for the week of the USDF shutdown (6/25-7/4).  We (Colin, Fritz, Frossie, Dan) are investigating setting up an interim Postgres for them in the cloud to host DP0.3 catalogs just for that week (to be torn down as soon as USDF comes back online).  Dan having a look and putting together a proposal; we should have this within a day or two.
  • Colin informs us that the SS workshop is taking place earlier than we had thought (happening in mid-June), and this will overlap Fritz's vacation.  Fritz to make sure Colin and Igor have keys and notes ahead, in case any Postgres support is needed during that time.
  • Postgres + pgsphere at USDF seems to be working well for DP03 so far.

Using qserv-ingest  at UKDF for ingesting Gaia_EDR3 (continued from the last week)

Igor Gaponenko:

  • it looks like the latest version of the workflow works fine at IDF with the contribution files being pulled from IN2P3.
  • discussing an option to get direct access to the Qserv Kubernetes Cluster at UKDF for myself and Fabrice Jammes in order to investigate an issue with timeouts. This (getting an account in there) may take at least one week.

Alleged memory leak in the Replication Controller REST services (continued from the last week)

Fritz Mueller:

  • Fritz to re-enable jemalloc now in Qserv container builds
  • The consensus is to do this via LD_PRELOAD env var (e.g. in entrypoint script) rather than a static link

Igor Gaponenko on the memory "leak":

  • DM-39080 - Getting issue details... STATUS
  • Good progress on this ticket. I'm about to make the PR.
(tick)DP0.2 static ObsCore update needed at SLAC

Fritz Mueller:

  • Fritz is to make the same manual static ObsCore patch at SLAC prod Qserv as was done at IDF
  • DM-38507 - Getting issue details... STATUS

Igor Gaponenko: note that we have 3 identical Qserv instances at USDF that need to be updated. See Managing Qserv instances at SLAC.

(tick)File-based result delivery, Qserv lockups

Fritz Mueller:

  • Qserv release 2023.5.1 was cut last week, which includes John's patch for the cancellation-triggered lockups.  This was deployed to IDF -int  and SLAC -prod.  Will roll out to IDF -prod during patch Thursday tomorrow.
(tick)Casandra at USDF

Fritz Mueller:

  • Still need to schedule a meeting for this; stay tuned...
  • Colin notes AP prompt-processing has momentum now; want to keep that moving which could mean interaction w/ a Cassandra APDB soon-ish.

Action items