(back to the list of all Panda meeting minutes)
Zoom Link
Time
8 am PT
Attendees
Brian Yanny Wei Yang Richard Dubois James Chiang Wen Guan Mikolaj Kowalik Edward Karavakis Michelle Gower Jen Adelman-Mccarthy Fabio Hernandez Tim Jenness
Regrets
Links
- CM/Panda Interaction: https://confluence.lsstcorp.org/x/f9lGDQ
- Panda Status: link
- Panda team's Rubin Work/Priority List
- CE monitoring at Lancaster: https://lsst.lancs.ac.uk/fabric/
Agenda:
- CM news
- Panda news
- uniform PQ name and batch job name?
- news on using Lancaster monitoring for HammerCloud? can it submit Panda jobs?
- Panda in IDF, show we delete them? How?
- FYI: Update on Panda meets Rucio: discussion in Rucio data replication meeting on deterministic vs non-deterministic RSE
Notes:
- CM News
- Can now use CM tools to submit to multi-DF (before we had to use bare bps commands). However, some features are needed in CM tools. e.g. need scripts to run chain-collection command.
- Working on accessing remote sites directly for debugging purpose.
- See a few job with long wall time >> CPU time.
- more stress test after Panda config changes during winter break? Sierra will try later this week or next week.
- Heartbeat info from payload?
- pipeline team implemented logging infrastructure, and should be turn on
- pilot should monitor this and to avoid killing tasks.
- Wen will check and send a message to the pipeline team that we want to configure the logging infrastructure to emit heartbeat every 2h (ideally ~30m).
- Where to set the bps retry? In bps submission yaml. Currently per task? Wen will work on finer granularity.
- Panda News:
- fixed a problem in bps report.
- Event service is not ready yet. Need to fix a panda monitoring problem (Postgres specific).
- Readiness probe in k8s (suggested during Panda K8s deployment review) implemented.
- Panda/iDDS accept ping test. Currently probe the web frontend, not the backend agents.
- HammerCloud
- Peter is looking into it, based on Lancaster CE monitoring
- will include pipeline check jobs submitted via Panda.
- Panda meets Rucio
- Steve is working on registering output to Rucio. His script does in-place registration (avoid copying/upload). Work with non-deterministic RSEs but the main RSEs are all DF are deterministic.
- Concern about extra info in Rucio DB is using non-deterministic RSEs widely.
- To use deterministic RSE, one question is how to handle Rucio scope.
- To be discussed in Rucio data replication meeting.