Requirements

Metrics Sources

ap_verify must be able to extract and manage metrics from the following sources (not all of which may actually be provided by the final pipeline or metrics framework)

  • Persisted intermediate and final pipeline products, such as catalogs or images
  • The prompt product ("level 1") database
  • lsst.verify.Job objects persisted at the CmdLineTask/SuperTask level
  • metadata returned at the CmdLineTask/SuperTask level

Error Handling

As a verification and testing framework, ap_verify should try to produce metrics even in the event of catastrophic pipeline failure. In particular, the sciVisitAlertFailure metric from Alert Production Metrics cannot be supported unless ap_verify recognizes and handles failed runs.

The exact policy for how much information ap_verify should preserve in which failure modes is TBD.

Butler Repository Management

Are there any cases where the user of ap_verify will care where it creates input/output repositories, or are metrics the only product of interest?

Are there any cases where ap_verify would need to create Butler repositories that are not filesystem-based?

Database Management

ap_verify must be able to run pipeline implementations that use l1dbproto (or any successors?), as mentioned on DM-14273 - Getting issue details... STATUS . However, it should also be runnable as a self-contained system.

Use Cases

Regression Testing

Goal: Detect bugs introduced by code changes throughout the AP pipeline. Test data sets are small (running time in minutes), well-behaved, and possibly from multiple instruments.

Primary actor: Automated build system

Desired behavior: any pipeline failure is flagged as a problem in the Stack. Metrics are uploaded to SQuaSH in all cases for later QA monitoring.

Process:

  1. The build system builds ap_verify and all dependencies, and acquires the test data.
  2. For each data set:
    1. The system runs ap_verify to completion or until failure.
    2. Metrics are uploaded to SQuaSH.

Software Verification

Goal: Test AP pipeline performance and handling of edge cases. Test data sets are mid-sized (running time in hours to days), and from multiple instruments

Primary actors: ap_verify tester, pipeline analyst

Desired behavior: code failures are flagged as a problem in the Stack, while bad input data are handled gracefully (perfect discrimination not required)

Process:

  1. The tester installs ap_verify and its dependencies (either a weekly or "all master branches" may be useful, depending on what is being tested)
  2. The tester runs ap_verify on a pre-installed dataset, possibly using a script to manage arguments or concurrent processing
  3. The operator recovers metrics from disk and analyzes them using suitable scripts or tools (such tools are out of project scope)

Large Data Demo

Goal: Test AP pipeline performance at scale. Test data may consist of entire surveys, requiring high-end computing resources.

Primary actor: Pipeline operator

Desired behavior: pipeline failures should be handled as they would be in production – both bad data and bugs must be handled gracefully

Process:

  1. The operator prepares (downloads, mounts, etc.) the data
  2. The operator installs a specific version of ap_verify and its dependencies
  3. The operator calls ap_verify through a work management system
  4. The operator recovers metrics from disk and analyzes them using suitable scripts or tools (such tools are out of project scope)

Operations Drill

To my knowledge, nobody has expressed interest in running ap_verify on individual visits as they get added to a repository, rather than as batch data sets.

Design Notes

Metrics Sources

Some of the data sources will be ap_verify's responsibility, others may be provided by ap_pipe. In the latter case, the relevant data must be communicated from ap_pipe to ap_verify.

  • Persisted products can be retrieved from output repositories using the Butler. Since standalone tasks are expected to accept repositories as input instead of creating them, ap_verify chooses the repository URIs (and therefore knows which ones to query). However, the current Butler interface makes it difficult to make queries against "all" dataIds, which is normally what's of interest to ap_verify.
  • The database location may be something specified by ap_verify (as part of the "environment" for a pipeline run), or an implementation detail of ap_pipe. In the latter case, ap_pipe must give ap_verify some kind of handle for accessing the database; a prototype for this role is available in the current system as lsst.ap.association.AssociationDBSqliteTask.
  • Jobs are currently persisted independently of the butler, and Eric/Krzysztof have hacked around this by asking that tasks report the location of persisted Jobs using a standardized metadata key. In the production system, any Jobs with task-specific metrics may be accessible via the Butler.
  • Currently, ap_pipe reports metadata to ap_verify after each CmdLineTask is run. Metadata can in principle be extracted using the Butler, but this requires knowledge of the specific tasks called by ap_pipe, as well as implementation details of those tasks.

Template Management

To ensure that ap_verify always uses templates compatible with the pipeline code, we may want ap_verify to be responsible for template generation at run time. An alternative is versioning datasets according to which Stack versions they are compatible with (see discussion on DM-12853 - Getting issue details... STATUS ).

  • No labels