(back to the list of all Panda meeting minutes)
Zoom Link
Time
8 am PT
Attendees
Wen Guan Edward Karavakis Brian Yanny Jen Adelman-Mccarthy Michelle Gower Tim Jenness Mikolaj Kowalik Peter Love Wei Yang
Regrets
Agenda:
- Update
- bps panda submission
- site issues
- panda installation at USDF
- Next steps
- Panda submission of ci_hsc_gen3 work at all DFs.
- What is the next step? try DP0.2 with one tract, then two ?
- Then what?
- optimizing Panda for large # of small jobs: 1) clustering, 2) can iDDS groups several jobs of the same tasks (Wen indicated that the Panda team is thinking about something)
- Play with data placement. Once data is transferred, how to register them to Butler. This depends on Rucio-Butler integration work.
- knowledge transfer with the executing team ?
Notes:
- Update:
- Panda DB deployment OK, Eddy working on backup policy. more test tomorrow.
- IAM and Dex integration, mostly work / In progress
- CERN new iDDS
- support using bps to submit Panda tasks to remote DFs (intend to optimize campaign management.
- Note: Middleware team prefers optimization happening in BPS
- use bulk message to reduce the number of internal message iDDS needs to handle.
- support using bps to submit Panda tasks to remote DFs (intend to optimize campaign management.
- ARC CE issue:
- USDF ARC CE: wrong VOMS .lsc file for one of the two VOMS servers (voms and voms1). Fixed.
- FrDF ARC CE: Harvester/HTCondor sometime can't not find jobs. Question (Wen): when will ARC CE purge old/finished jobs: Answer(Peter): pretty longer after job finish, i.e. 24h. tunable.
- Making Harvester/HTCondor logs available on the web. Apache was setup for this, Wen think it should work.
- Next Steps:
- Would like to turn this ci_hsc_gen3 test into a routing test (similar to the idea of ATLAS HammerCloud). Peter Love is working on this
- Overhead. Would like to understand (or confirm) where are the following latency coming from:
- Running cs_hsc_gen3 (in terminal) on a single computer with several cores, takes ~1h.
- At USDF (via Harvester running at sdfrome001, not via ARC CE), it took 1h20m.
- AT FrDF (via ARC CE), it took 2h.
- Clustering task A,B,C together should help
- Can we group mutiple A's in a single Grid job (run sequentially). This is 1) no easy because we need to know how long A will takes 2) where to do this? Clustering or iDDS
- Clustering in Panda
- ci_hsc_gen3 provides clustering. bps_condor and bps_parsl use it.
- bps_panda can't not utilize clustering because of the 4000-character limit. The current workaround allows Panda to run individual tasks. Michelle is working on enabling Panda to utilize clustering
- Communication:
Sometimes a problem can be solved / optimized in different places, e.g. in Panda or in BPS. It will be very helpful if we bring up discussion when a potential scenario like this occur, so that we can find the best place to address the problem.