Mondays 12PT (3 - 3:50pm ET)

Yusra's Zoom: https://princeton.zoom.us/my/yusra

Attendees:

Yusra AlSayyadLee Kelvin, Erfan Nourbakhsh, Fred MoolekampClare Saunders,Joshua Meyers, Orion Eiger, Robert Lupton, Eli Rykoff, Colin Slater, Lauren MacArthur , Hsin-Fang Chiang , Jim Bosch , Nate Lust 

Regrets:

Agenda:

  • Meeting recorder - Clare! (last 6 meetings were:  Lee, Colin, Jim, Keith, Fred, Nate) 
  • Announcements
    • None
  • Review Action items from last month
    • Yusra AlSayyad  and Orion confirmed that it was w_2022_48 that was used in the success mystery from last time.
    • Datasets types - parquet table as an astropy table instead of a dataframe - this gets propagated everywhere.
    • Failures from w_2023_03 - prod will be slightly changing with Jim Bosch 's changes. 
  • Processing Status
    • W07 - lots of problems but most do not have to do with pipelines
      • issue with ip_diffim - fix needed on step4
        • you can now run on a ticket branch (used DM-38209)
        • Response from Yusra: While testing this branch, memory problem on ip_diffim main was found (as of two weeks ago), as in corruption. Probably introduced by DM-32406. Until this is fixed, we still need to run on a ticket branch.
        • Not clear why DM-32406 would cause memory issues, maybe something upstream
        • Eli: I apologize if this has to do with the pybind11 consolidation. Yusra thinks this was not the problem, because seg faults happened before Matthias's ticket. DM-32406 was the only new ticket merged to main since the previous successful run.
      • everything else worked except logging errors
      • done since last Tuesday, but dispatch has not been working since then.
  • Review the w_2023_07/ DM-38042 rerun:
    • We don't have metrics, but we do have plots.
    • Recall that weekly 03 is the one where we didn't have the objects on the edges of the tracts.
    • We will compare to w_2022_48, because that is the last good one since Jim's major pipeline changes.
    • Lots more plots than in w48.
    • Some stats are still missing on the two-histogram plots
    • Some astrometry difference plots don't have the expected distribution (this is not new). Clare Saunders is going to look into this.
    • Comparing resource usage between w03 and w07. There are some big differences that are probably tied to the w03 issues. 
  • step4 problems.
    • finding plugin fixed on DM-38209 (but testing with main ip_diffim shows that there might be memory problems in w11: https://lsstc.slack.com/archives/C025SQLKV0X/p1678143478405449)
    • Recall history:
      • w12: psfex
      • w16: piff  (bad size residuals)
      • w20: finalizeCharacterize (bad apcorr configs → bad stellar locus)
      • w22: lanczos11 + apcorr configs (better stellar locus and size residuals approx equivalent to psfEx!
      • w24: PIFF kernelSize to 25. new scarlet lite storage.
      • w28: attempt at fixing measure failures:  DM-35722 - Getting issue details... STATUS
      • w32: First RC2 at SLAC. subtractImages compatibility mode on
      • w36: subtractImages Compatibility mode off
      • w40: 
      • w44:
      • w48: 9697/7 succeeds, extra subtractImage failures gone
      • w03:
      • w11:
    • Chronograf, plot-navigator review of _07
  • w_2023_06 DC2:
    • Orion - there were no errors at all. Jim says no jointcal (i.e. no tract based steps) means no problem.
    • g band was lost in a previous rerun but is now back.
    • stellar_locus_width_wPerp is way up
      • Eli merged a change in how we compute aperture correction maps, but the stellar locus is mysterious
      • Jim: Is the selection now including more things?
      • Eli: There are not a huge number of stars that go into calculation.
    • On the nightly you can see some big jumps in a few metrics
      • One change was in AM2 g-band - this lines up with gbdes being turned on
      • Stellar locus - seems to be tied to aperture.
        • The stellar locus would be changed by the extendedness, which is affected by the change in aperture corrections.
        • We will now have more fainter objects
        • Second jump in stellar locus probably tied to gbdes
        • Jim: Not panicking yet, but need to see what happens on the monthly rerun
        • Didn't see this jump in the RC2 stellar locus plots - you can see some change on the plots, and you also see that the number of stars goes up – different selection effects.
        • Jim and Robert: what can we do to mitigate the fact that our metrics are very sensitive to selection effects?
  • What do we expect next time 
    • potential memory issues in ip_diffim after DM-32406 
    • I'm sorry DM-38209 didn't get it before w11. You'll need to run with a branch again (sad)
    • FGCM now uses IsolatedStarAssociation instead of its own associator. This shouldn't cause any major changes, but there will be different random selection, and the tasks have changed.
    • Orion is trying to get w11 running using cmtools, but not working yet. Don't know how to run with cmtools on a ticket branch. 
      • Yusra: longer than 4.5 weeks (runtime of w07) is too long. If it takes longer than two weeks because we are trying to figure out how to run, that is workable. 
  • Other notes:
    Yusra: remember that you should be looking at plots in the areas that you are responsible for!
    Robert: How far are we from just getting an alert from the metrics that we should look at the plots?
  • AOB:



Action Items

DescriptionDue dateAssigneeTask appears on
  • Add a plot with fakes stats to the dashboard. Sophie Reed 
04 Sep 2020Sophie ReedDRP Metrics Monitoring 2020-08-07
  • Sophie to add field in metric definition to hold thresholds. DM-43364 - Getting issue details... STATUS : We need to talk about this when Sophie is back!
DRP Metrics Monitoring 2024-04-22
  • Sophie to add field in metric definition to hold thresholds. DM-43364 - Getting issue details... STATUS
DRP Metrics Monitoring 2024-03-18
  • Clare: add analyzeMatchedVisitsCore to drp_pipe step8
DRP Metrics Monitoring 2023-06-26
  • Sophie: make a new list for outstanding analysis_drp plots that require moving, send to Jim
DRP Metrics Monitoring 2023-06-26
  • turn catchFailures on in calibrate. Add flag to indicate that deblender failed because PSF is bad. 
DRP Metrics Monitoring 2022-10-31
  • Yusra AlSayyad Eric's account was deleted; we need to make sure he has all his logs. 
Yusra AlSayyadDRP Metrics Monitoring 2021-06-14
  • Arun Kannawadi Modify rho stats in pipe_analysis  to use debiased moments (see  DM-30751 - Getting issue details... STATUS ). 
Arun KannawadiDRP Metrics Monitoring 2021-04-19
Arun KannawadiDRP Metrics Monitoring 2021-03-01
  • Yusra AlSayyad Do a rerun with w50 PS1 refcat and one with shrunk refcat errors. 
Yusra AlSayyadDRP Metrics Monitoring 2021-01-04
  • Jeffrey Carlin Add an absolute astrometry match-to-refcat metric to dashboard  DM-34153 - Getting issue details... STATUS
Jeffrey CarlinDRP Metrics Monitoring 2021-01-04