Monika Adamow 


  • Meeting recorder (Last two times were Brian, Colin)
  • Announcements
  • Reviewed actions from last meeting. 
  • Campaigns
    • RC2 (HFC): w_2023_03 in progress on step3.  2 pipeline errors to follow up in slack. 
    • DC2 (OE): started w_2023_05 step1 . Config location doesn't exist. Will ask in slack. 
    • AuxTel (HL)  Will run to coadd now. 
    • AP (EH) finished a run, will run analysis, 
      • bps Parsl database manager logging in stuck in debug level mysteriously.  Looked everywhere and it should be at info level.   
    • PDR2 (JA) has a list of questions in Google Doc.  Yusra answered some. 
      • YA: RC2 and DC2 have different skymaps. RC2 is a subset of HSC-PDR1. 
      • JA: look into setup.  How big is too big for the qgraph. Whether or not to use cm_tools or divide things manually. 
      • YA: there was experience in grouping from DP0.2 
      • EC: cm_tools can split into groups.  Started with Orion's yaml.  There are more things to keep track of when it's split into groups. But if graph is too big and crashes, more to track too. Sometimes graph generation can take 16GB.  What is the safe size for bps and panda.  
      • HL:  In DP0.2, experience and determine a reasonable group size before starting each step. 
  • Tooling
    • OE: doing a run using cm_tools and w_2023_48.  Is comparing results. Got an error related to out of memory.
    • EC: "table of errors"  Tracking errors take a lot of time, so a database of errors and actions that should be taken (rescue, somebody look at it, etc). First table is for types. Second is instances and it links to particular jobs.    Make decisions on go-no-go.  Aggregate errors. 
    • OE: want to store error messages so to save time from going through panda or production_tools or tracking submission errors. 
    • YA: future 4week sprint: identify pain points and make cm_tools solve them. 
  • Show and tell
    • v2 demo from Fritz
      • cm_tools v1 is command line only.   v2 is web service. 
      • use sqre's safir package fastapi and service  
        • backend redis database 
        • structured logging outputs 
        • automated extracted documentations 
      • re-work the command line client 
        • talk to the same api that the web service uses 
      • use the more generic framework arq instead of slurm only 
      • Path forward: port the data model of v1 cm_tools to the new framework. Reach parity and then transition. 
      • EC: start PR and code review on Fritz's branch, not on Eric's v1 package. 
      • FE: flask and not javascript for the web gui? CS's production_tools error summary page. 
    • build-gather-resource-usage-qg (Yusra, Notebook) → next time 

  • AOB