Mondays 12PT (3 - 3:50pm ET)

Yusra's Zoom: https://princeton.zoom.us/my/yusra

Attendees:

Regrets:

Agenda:

  • Meeting recorder - (last 6 meetings were:  Lee, Robert, Arun, Clare, Lee, Colin) 
  • Announcements
    • SLAC is down this week - metrics meeting plots available on tiger for this week only
    • 2 meetings ago, open actions to investigate stellar locus jumps in RC2_subset (Eli)
      • Eli update: not entirely necessary to run everything by hand
      • tracking wperp - very sensitive to RC2_subset (noise? local minima?)
      • connects to aperture corrections on stars in single frame
      • band mis-registering can impact coadd-level photometry
      • we were also not handling primary detections properly in single-frame
      • as of 2 nights ago: still at 2.49 millimags (aka, a little better), so good news.
    • Clare w19 report:
      • fixed issue with meas_algorithms not propagating proper motions correctly
      • expected little difference - actually has some impact, reducing scatter, so more good news
      • does not impact gbdes directly, but improvements will filter through to analysis_tools
    • Jim: plan for future of RC2_subset is to run on a cronjob, rather than Jenkins
      • Eli: you /can/ download the outputs from Jenkins if you click the correct buttons!
  • Processing Status
    • Processing of w_2023_23 RC2 (Orion) DM-39610 - Getting issue details... STATUS
      • expected changes: 
        • Clare: astrometric repeatability metrics - now in analysis_tools; more work required to push to Sasquatch (maybe selection?)
          • Clare: add analyzeMatchedVisitsCore to drp_pipe step8
        • Eli: three detectors in RC2 that were formerly crashing now fixed
        • Orion: cm_tools changes:
          • when one step is complete, the next will now auto-begin (DM-39610)
        • RA/Dec columns were renamed
      • report:
        • very little to report on the pipelines side
        • Orion & Eli chased down aperture correction warnings incorrectly being reported as errors
        • encountered many frustrating pandas issues
        • broke faro into three different subsets - now implemented in drp_pipe
          • hard to run faro en masse; 
          • for the next CM run, this is how we will run faro
        • encountered step 4 errors as usual
        • subtract images still encountering issues as in previous months
          • Orion: look into location to store resource usage plots
      • chronograf report:
        • Yusra/Lee: we can look into whether we want to keep tracking both robust and non-robust sky object flux metrics
        • Nate: ellipticity residuals look somewhat volatile
          • Jim: perhaps a "meta-problem" (plotting issue on Chronograf) - if we can add a threshold onto the Chronograf, may indicate that we shouldn't pay too much attention to relatively small-scale fluctuations
      • memory / runtimes (w15 → w23)
        • consolidateObjectTable memory usage increase (from ~17 to ~20 GB)
        • a single patch is using a significant amount of memory for deblend
        • finalizeCharacterization now taking ~order of magnitude longer to finalize - can be refactored, but not a priority now
        • forcedPhotCcd runtime down a little
        • fgcmOutputProducts now up by order of magnitude
          • this task is all I/O bound, so probably just cluster issues
        • Clare: issue noticed when running RC2 with BPS/HTCondor
          • cores not always relinquishing memory from a prior task, e.g., going from a high memory task to a low memory task
          • end up not being able to run as many tasks at a time as you would expect
          • may not impact mem-usage plots (it's the cap on available memory per-core)
          • are cm_tools impacted by this issue?
          • Orion: cm_tools does little with memory requests - uses either default mem-request, or manual memory request override
            • cm_tools/Orion had a lot of trouble with memory requests recently
            • panda used to retry failed jobs with more memory, but now we retry with the same memory (loss of functionality)
            • much of this hidden inside panda black-box, requiring manual resubmit with memory request overrides
            • unknown how much cm_tools impacted by this right now - will talk more offline
  • Review the w_2023_23 rerun:
    • Recall 2023 history:
      • w03:  Issues with using finalize characterize PSFs downstream. 
      • w07: ip_diffim. Cannot find plugin (fixed on DM-38209) run with ticket branch. Issues with finalizeCharacterize downstream gone.
      • w11: GBDES on.  detectAndMeasure segfaults (fixed with hours to spare for w15) 
      • w15: pseudo- quasi-random skyObject placement (DM-23781), source selectors use isPrimary (DM-39141)
      • w19: Parallax and PM on (DM-37943), astrometric match improved (DM-38808), cleared mask plan of the template. (DM-38901)
      • w23: DM-39141 AMx/ADx/AFx
    •  Do the metrics on Sasquatch look OK?  (see below) 
    • Any changes in the resource usages? (see below) 
  • What do we expect for w_2023_27? 
    • CS: re reinterpreted AFx. So we expect the baseline AF values to the higher than we were used to before.
    • Dan has some new plots in analysis_tools (cmodel bulge/disk) in extended so they don't get run automatically
    • repeatability metrics (both astrom/photom) on sasquatch)
    • those 3 errors in step 1 gone!
  •  AOB:
    • Jim: any analysis_drp plots than can be retired due to replacement in analysis_tools?
      • Sophie: we have a list, and we can start to prioritize that
        • Sophie: make a new list for outstanding analysis_drp plots that require moving, send to Jim


Sasquatch:



Memory changes: 

Runtime changes:  runtime_w23vs15.png



Action Items

DescriptionDue dateAssigneeTask appears on
  • Add a plot with fakes stats to the dashboard. Sophie Reed 
04 Sep 2020Sophie ReedDRP Metrics Monitoring 2020-08-07
  • Sophie to add field in metric definition to hold thresholds. DM-43364 - Getting issue details... STATUS : We need to talk about this when Sophie is back!
DRP Metrics Monitoring 2024-04-22
  • Sophie to add field in metric definition to hold thresholds. DM-43364 - Getting issue details... STATUS
DRP Metrics Monitoring 2024-03-18
  • Clare: add analyzeMatchedVisitsCore to drp_pipe step8
DRP Metrics Monitoring 2023-06-26
  • Sophie: make a new list for outstanding analysis_drp plots that require moving, send to Jim
DRP Metrics Monitoring 2023-06-26
  • turn catchFailures on in calibrate. Add flag to indicate that deblender failed because PSF is bad. 
DRP Metrics Monitoring 2022-10-31
  • Yusra AlSayyad Eric's account was deleted; we need to make sure he has all his logs. 
Yusra AlSayyadDRP Metrics Monitoring 2021-06-14
  • Arun Kannawadi Modify rho stats in pipe_analysis  to use debiased moments (see  DM-30751 - Getting issue details... STATUS ). 
Arun KannawadiDRP Metrics Monitoring 2021-04-19
Arun KannawadiDRP Metrics Monitoring 2021-03-01
  • Yusra AlSayyad Do a rerun with w50 PS1 refcat and one with shrunk refcat errors. 
Yusra AlSayyadDRP Metrics Monitoring 2021-01-04
  • Jeffrey Carlin Add an absolute astrometry match-to-refcat metric to dashboard  DM-34153 - Getting issue details... STATUS
Jeffrey CarlinDRP Metrics Monitoring 2021-01-04