Development of a Gen3-based package for Verification and Validation

Repository for ideas, discussion, and related info for the development of a package to compute metrics as pipeline tasks. This package is a Gen3 Butler/PipelineTask replacement for validate_drp.

Goals

To date, computation of metrics has been done with the validate_drp package. With the transition to Gen3 middleware, we want to develop a gen3 based package for metric computation.

Design Considerations

Design considerations are outlined in DMTN-141. This document is a work in progress.

References and Prior Art

Krzysztof Findeisen 's design documents for MetricTasks: DMTN-098 and DMTN-057
Git repo for development: https://github.com/lsst-dmsst/metric-pipeline-tasks
Related git repos: validate_drp, lsst.verify, verify_metrics

Workplan

Proof of concept in F20 workplan is Jira Epic DM-24624 - Getting issue details... STATUS .The initial goal was to demonstrate that this Gen3 metric-pipeline-task based architecture could work as a replacement for validate_drp.

S21A work is being planned in DM-26993 - Getting issue details... STATUS

Future work

List here ideas and thoughts for future development and desired functionality. These will be reviewed and, if selected for implementation, included in future Jira epics.

Feature Request or Idea	Notes and Comments	Priority
Porting ap_verify	AP verify uses metric task already. This is an online system. 2021-03-16: Processing of data and the computation of the metrics is intertwined and so probably not suited to faro. Check with Eric Bellm	Low
Visualization of calculated metrics	validate_drp produces a few simple static histograms, output to pdfs when the package is run. Would like to develop a more dynamic visualization dashboard to interactively view plots and include, .eg drilldown. How do we produce a useful frontend with diagnostic plots. Integration with T. Morton's QA explorer? 2021-03-16: Work with Tim, try to use his tools to visualize metrics, see if it responds to our needs. How to visualize scalars beyond just a time series of scalars How to drilldown into the data used to compute the scalar and inspect. Review the QA WG report.	High
User guide to help new contributors get up to speed	2021-03-16: Will be done ASAP, everything in place. Need to assign tasks to people.	Medium
Implement all remaining KPMs defined in the SRD	2021-03-16: Being planned in the DM-SST meeting. High priority in 2021	In Progress (SST)
Multi-band data	As summarized in DMTN-091, `ci_hsc_gen3` has `ri` bands, and `validation_data_hsc` has `riy` (but isn’t Gen3-ified yet). We really need `ugrizy`, but we’ve also currently got the issue that we have to frequently rebuild the repos. 2021-03-16: RC2 data has multiband, RC3 will have more.	Done
Consider the granularity at which we should calculate metrics	Integration with T. Morton's QA dashboard to read/aggregate them? 2021-03-16: Faro gives us the ability to compute metrics at various different scales. We should address this in the context of looking at metrics computed on a dataset. Follows on from the visualization item (2 above)	Medium
What needs to happen for us to be able to used shared datasets on lsst-dev rather than rebuilding them with each weekly in our own directories?	2021-03-16: A: gen2→gen3 conversion, which is already being done for RC2. Problem solved!	Done
Check that filtering of matched catalogs removes only bad sources, and not the entire group (for various criteria)	validate_drp removes the entire group. In the initial PoC we did the same as validate_drp for comparison. Need to check this and fix if it is not done after the PoC. 2021-03-16: Also, we should be applying cuts to datasets before matching catalogs (currently the cuts are applied after catalog matching). More generically - how do we consistently apply quality cuts / filters?	High
Filter/remove sky sources from consideration (in most metrics). e.g., they may be causing the regression in metrics seen in DM-25116.	We currently do not do any explicit checking. Quality cuts in use probably avoid sky source but needs to be confirmed. 2021-03-16: See above - how do we consistently apply quality cuts / filters?	High
Can we use this framework for real-time processing at the observatory and quick-look analyses?	Talk to Chuck Claver	In progress on #rubinobs-sitcom
Comparison to external reference catalogs	E.g Gaia, space-based imaging (HST), HSC-SSP, DECam, spectroscopic catalogs Interesting astrophysical objects: Luminous red galaxy catalogs QSOs Known variable stars
Performance as a function of focal plane position
Performance as a function of system telemetry/metadata (e.g., airmass, image quality)
Metrics computed on spatial scales different from tract and patch	We might not want to be tied to scales that are configured for data processing. Examples include ellipticity correlations
Performance for injected sources
Bootcamp/hackathon to teach people to write metrics	Focused hands-on 1-2 day bootcamp to get people implementing metrics. Get all the SRD/DMSR KPMs implemented. Get commissioning team integrated. 2021-03-17: Shoudl complete the user guide first	Medium
Review algorithms	Many of the metrics were ported as-is from validate_drp. The algorithms were never reviewed as part of validate_drp. Are the current algorithms the way we want to or should be calculating these metrics?	Done - DM-SST meeting 2021-02-22
Compare multiple methods of measuring a given quantity	For example, compare aperture and PSF photometry for stars, compare PSF and extended model photometry for stars, compare PSF photometry computed from the coadd and from individual epochs. These analyses likely involve multiple dataset types
Metrics based on query of "big table" rather than specific units of data	Not sure this is well posed yet, but there might be use metrics that are more efficiently done as a query of rare objects and/or statistical samples of objects
Use provenance to extract dataIds	Can we use the provenance system to extract the dataIds used to compute a metic rather than storing them?
Creating aggregated quantities	Currently we compute a mean or medium but not a histogram of the values. Aggregation is expensive if rerun everytime we want to produce a plot - can we define and persist aggregated quantities. If so, what is the optimal granularity of binning?
Run on DC2	Currently, Eric M. is regularly reprocessing DC2 data with Gen2, but it is not being converted to Gen3. Eric N. (from Operations) has scripts to do the conversion, but if we want the repos on NCSA machines, somebody from Construction should probably be doing the conversion. Would be great to have DC2 available.
Run on all 3 RC2 tracts in a single execution.	Current running is piecewise, tract-by-tract. Want to run in a single execution
SQL-like selections on input datasets	e.g magnitude < 12

Space shortcuts

Page tree

Goals

Design Considerations

References and Prior Art

Workplan

Future work