Panda Meeting 2023-03-01

(back to the list of all Panda meeting minutes)

Zoom Link

Time

8 am PT

Attendees

Wen Guan Edward Karavakis Brian Yanny Jen Adelman-Mccarthy Michelle Gower Tim Jenness Mikolaj Kowalik Peter Love Wei Yang

Regrets

Agenda:

Update
1. bps panda submission
2. site issues
3. panda installation at USDF
Next steps
1. Panda submission of ci_hsc_gen3 work at all DFs.
2. What is the next step? try DP0.2 with one tract, then two ?
3. Then what?
  1. optimizing Panda for large # of small jobs: 1) clustering, 2) can iDDS groups several jobs of the same tasks (Wen indicated that the Panda team is thinking about something)
  2. Play with data placement. Once data is transferred, how to register them to Butler. This depends on Rucio-Butler integration work.
  3. knowledge transfer with the executing team ?

Notes:

Update:
1. Panda DB deployment OK, Eddy working on backup policy. more test tomorrow.
2. IAM and Dex integration, mostly work / In progress
3. CERN new iDDS
  1. support using bps to submit Panda tasks to remote DFs (intend to optimize campaign management.
    1. Note: Middleware team prefers optimization happening in BPS
  2. use bulk message to reduce the number of internal message iDDS needs to handle.
4. ARC CE issue:
  1. USDF ARC CE: wrong VOMS .lsc file for one of the two VOMS servers (voms and voms1). Fixed.
  2. FrDF ARC CE: Harvester/HTCondor sometime can't not find jobs. Question (Wen): when will ARC CE purge old/finished jobs: Answer(Peter): pretty longer after job finish, i.e. 24h. tunable.
5. Making Harvester/HTCondor logs available on the web. Apache was setup for this, Wen think it should work.
Next Steps:
1. Would like to turn this ci_hsc_gen3 test into a routing test (similar to the idea of ATLAS HammerCloud). Peter Love is working on this
2. Overhead. Would like to understand (or confirm) where are the following latency coming from:
  1. Running cs_hsc_gen3 (in terminal) on a single computer with several cores, takes ~1h.
  2. At USDF (via Harvester running at sdfrome001, not via ARC CE), it took 1h20m.
  3. AT FrDF (via ARC CE), it took 2h.
  4. Clustering task A,B,C together should help
  5. Can we group mutiple A's in a single Grid job (run sequentially). This is 1) no easy because we need to know how long A will takes 2) where to do this? Clustering or iDDS
3. Clustering in Panda
  1. ci_hsc_gen3 provides clustering. bps_condor and bps_parsl use it.
  2. bps_panda can't not utilize clustering because of the 4000-character limit. The current workaround allows Panda to run individual tasks. Michelle is working on enabling Panda to utilize clustering
Communication:
Sometimes a problem can be solved / optimized in different places, e.g. in Panda or in BPS. It will be very helpful if we bring up discussion when a potential scenario like this occur, so that we can find the best place to address the problem.

Space shortcuts

Page tree

Zoom Link

Time

Attendees

Regrets

Agenda:

Notes: