At the DM Leadership Team Virtual Face-to-Face Meeting, 2019-02-26 to -28, the DMLT discussed the recommendations of the QA Working Group (as extracted from DMTN-085) with an aim to prioritizing and deciding which recommendations required action.

In general, the below is written as suggestions regarding relative priorities and responsible teams, rather than as todo items for specific individuals (with a couple of exceptions). We should review key recommendations at a future DMLT meeting and check that work is being prioritised following DMLT recommendations.

QAWG-REC-1: Adopt the definitions of QA-related terms in the DMTN-085 glossary subsystem-wide

  • This glossary should be audited for correctness and for clashes with higher LSST glossaries. Perhaps via an RFC?
  • Following that process, the DMLT agrees that with this recommendation.
  • We suggest the DM-SST (Leanne Guy ) be tasked with it.
  • Now complete:  DM-20011 - Getting issue details... STATUS

QAWG-REC-2: Develop a new pipeline instrumentation and debugging system, replacing lsstDebug

QAWG-REC-3: Guidelines for the effective use of the pipeline debugging system should be supplied to developers

  • Accompanying documentation is implicit if we act on QAWG-REC-2.
  • Otherwise, the DMLT agrees that refreshed documentation for the existing system is appropriate.
  • This is a matter for Pipelines (John SwinbankYusra AlSayyad)

QAWG-REC-4: Debugging mode should be binary: it is either enabled or disabled, with no further configuration

  • This recommendation was considered to be part of QAWG-REC-2 & 3, and was not considered separately.

QAWG-REC-5: A log aggregation and monitoring service should be provided for large-scale processing jobs at the Data Facility

QAWG-REC-6: Tutorial and reference documentation for developers attempting to run jobs at scale should be refreshed

  • The DMLT considers that this should be a high priority for the LDF team after the new middleware is in place. (Margaret Gelman)

QAWG-REC-7: DM should formally adopt the PyViz ecosystem

  • The Architecture team was tasked with reporting on the possibility of interfacing PyViz with existing tools, and, in particular, with the LSP. (Kian-Tat Lim)

QAWG-REC-8: DM should adopt Dask to enable users to work with larger than memory data

  • This is already possible for internal users (ie, DM developers) thanks to SQuaRE.
  • Contact Frossie Economoufor details.

QAWG-REC-9: DM should provide clear, written guidance to developers about the availability, status and expected usage of image display tools

  • The DMLT did not regard this recommendation as sufficient to address outstanding use cases.
  • It was agreed that a new working group should be convened to further address this topic,
  • Wil O'Mullane — convene an “image display working group”. 

QAWG-REC-10: The design and implementation of the provenance system should have high priority in the project scheduling

  • The DMLT is not confident that existing Gen 3 middleware effort will adequately address all relevant use cases.
  • However, there was no appetite for further work before the Gen 3 effort has fully converged; at that point, further investigation or work may be necessary.

QAWG-REC-11 & QAWG_REC-12: Obsolete and unclear sections of the Developer Guide should be rewritten to provide clearer guidance on unit tests & The Developer Guide should be expanded to provide checklist-style documentation for code reviewers making clear what is expected from them during the review. 

  • The Architecture team was asked to refresh the Developer Guide (Kian-Tat Lim)

QAWG-REC-13: Provide a central location where examples, scripts and utilities which are not fundamental to pipeline execution are indexed and made discoverable

  • This is already on the SQuaRE radar ( DM-15807 - Getting issue details... STATUS )
  • Frossie Economou notes the QAWG and DMLT weight given to this ticket, but it is not currently scheduled for immediate action.

QAWG-REC-14: The Project should adopt a documented (in the Developer Guide) policy on the maintenance of example code

  • The DMLT agrees that a robust approach to broken examples is appropriate.
  • The caveat that some examples may simply by didactic, and were never expected to work, was noted.
  • This should be rolled into SQuaRE updates to the documentation system (Frossie Economou).

QAWG-REC-15: The Project should prioritize the development of a documentation system which makes it convenient to include code examples and that tests those examples as part of a documentation build

QAWC-REC-16: When running regularly scheduled (timer) jobs on the master branch of any releasable product, any build failure should be announced prominently to key stakeholders

  • The DMLT regards this as important.
  • Work is currently underway in SQuaRE. (Frossie Economou)

QAWG-REC-17: The Developer Guide should provide guidance about expected responses to Jenkins failures

QAWG-REC-18 & QAWG-REC-19: The versions of external packages used in the Jenkins system must always correspond to the minimum versions specified in stub packages and/or in the document list of prerequisites & The project should adopt a single source of dependency information and versions

  • Widely agreed, and effectively done or currently in progress.

QAWG-REC-20 & QAWG-REC-21: A standardized format for dataset repositories should be adopted across DM & Each dataset should have an explicitly named product owner

  • The DMLT considers the effort to design a standardized format and then make all existing repositories adhere to it is too great.
  • The idea of product owners was accepted, except the DMLT requests they be called “dataset owners” to avoid any possible ambiguity.
  • Simon Krughoff (SQuaRE) will act as a centralized point of contact for information about datasets.
  • John Swinbank & Simon Krughoff can collaborate on figuring out named owners for other datasets.

QAWG-REC-22: Datasets may be stored on either shared filesystems or Git LFS as appropriate, depending on the total size of the dataset

  • Nothing here to discuss.

QAWG-REC-23: A standardized test package design should be developed which addresses all existing use cases

QAWG-REC-24: A coherent plan for integration testing at all scales should be developed and published

QAWG-REC-25, QAWG-REC-26, & QAWG-REC-35: Formalise the lsst.verify.metrics system as the source of truth for metric definitions, by e.g. describing it in LDM-503 and LDM-639, Provide a high-level overview and data-model describing the metric definition system, & Provide a single, reliable source of documentation describing the SQuaSH system and a vision for its use in DM-wide metric tracking.

  • The DMLT agreed with the thrust of these recommendations, although detailed implementation was unclear.
  • It was agreed that John Swinbankshould convene a small group of Pipelines & SQuaRE developers to discuss further.
  • John Swinbank — convene a mini-working group to refine the design of lsst.verify.metrics.  

QAWG-REC-27 through QAWG-REC-31

  • These provide guidance to middlware developers, to the lsst.verify.metrics mini-WG (above), and to the development of QA drill-down tooling (below). They are not independently actionable.

QAWG-REC-32: Develop clear guidelines for integrating metric collection with pipeline code

  • This is addressed by DMTN-098; it's not clear that further action is required here for now.

QAWG-REC-33: Pipelines leadership should start using the metric definition and collection system

QAWG-REC-34: SQuaSH should issue alerts to developers and key stakeholders on regressions in important metric values

  • Already addressed by Chronograf.

QAWG-REC-36: The SQuaSH system should be closely coupled to the drill-down environment; in particular, the former should use the latter to enable drill-down functionality into particular metric values

  • May not be trivial, given the Chronograf architecture.
  • We should read this as guidance to developers of both systems, rather than a specific requirement.

QAWG-REC-37: It must be possible to submit metrics to SQuaSH from arbitrary pipeline execution environments.

QAWG-REC-38: SQuaSH should be able to store and display appropriate metric values per DataId

QAWG-REC-39 & QAWG-REC-40: DM should develop a browser-based interactive dashboard that can run on any pipeline output repository (or comparison of two repositories) to quickly diagnose the quality of the data processing & The dashboard should enable the analyst to start a Jupyter notebook session with the relevant datasets already loaded. 

  • We are currently in negotiations with an external contractor (Quansight) about the development of such a tool.
  • This effort is being led by Tim Morton (Pipelines), but with input from the DM-SST, SQuaRE, and others as necessary.


2 Comments

  1. Added labels so that actions on this page will show up in DMLT action summaries.

  2. Hi John Swinbank, regarding QAWG-REC-34  DM-18191 - Getting issue details... STATUS  will address the integration of Chronograf/Kapacitor and squarebot-jr to handle notifications. The other SQuaSH related recommendations are in  DM-17034 - Getting issue details... STATUS .