Requirements
Metrics Sources
ap_verify
must be able to extract and manage metrics from the following sources (not all of which may actually be provided by the final pipeline or metrics framework)
- Persisted intermediate and final pipeline products, such as catalogs or images
- The prompt product ("level 1") database
lsst.verify.Job
objects persisted at theCmdLineTask
/SuperTask
level- metadata returned at the CmdLineTask/SuperTask level
Error Handling
As a verification and testing framework, ap_verify
should try to produce metrics even in the event of catastrophic pipeline failure. In particular, the sciVisitAlertFailure metric from Alert Production Metrics cannot be supported unless ap_verify
recognizes and handles failed runs.
The exact policy for how much information ap_verify
should preserve in which failure modes is TBD.
Butler Repository Management
Are there any cases where the user of ap_verify
will care where it creates input/output repositories, or are metrics the only product of interest?
Are there any cases where ap_verify
would need to create Butler repositories that are not filesystem-based?
Database Management
ap_verify
must be able to run pipeline implementations that use l1dbproto
(or any successors?), as mentioned on
-
DM-14273Getting issue details...
STATUS
. However, it should also be runnable as a self-contained system.
Use Cases
Regression Testing
Goal: Detect bugs introduced by code changes throughout the AP pipeline. Test data sets are small (running time in minutes), well-behaved, and possibly from multiple instruments.
Primary actor: Automated build system
Desired behavior: any pipeline failure is flagged as a problem in the Stack. Metrics are uploaded to SQuaSH in all cases for later QA monitoring.
Process:
- The build system builds
ap_verify
and all dependencies, and acquires the test data. - For each data set:
- The system runs
ap_verify
to completion or until failure. - Metrics are uploaded to SQuaSH.
- The system runs
Software Verification
Goal: Test AP pipeline performance and handling of edge cases. Test data sets are mid-sized (running time in hours to days), and from multiple instruments
Primary actors: ap_verify tester, pipeline analyst
Desired behavior: code failures are flagged as a problem in the Stack, while bad input data are handled gracefully (perfect discrimination not required)
Process:
- The tester installs
ap_verify
and its dependencies (either a weekly or "all master branches" may be useful, depending on what is being tested) - The tester runs
ap_verify
on a pre-installed dataset, possibly using a script to manage arguments or concurrent processing - The operator recovers metrics from disk and analyzes them using suitable scripts or tools (such tools are out of project scope)
Large Data Demo
Goal: Test AP pipeline performance at scale. Test data may consist of entire surveys, requiring high-end computing resources.
Primary actor: Pipeline operator
Desired behavior: pipeline failures should be handled as they would be in production – both bad data and bugs must be handled gracefully
Process:
- The operator prepares (downloads, mounts, etc.) the data
- The operator installs a specific version of
ap_verify
and its dependencies - The operator calls
ap_verify
through a work management system - The operator recovers metrics from disk and analyzes them using suitable scripts or tools (such tools are out of project scope)
Operations Drill
To my knowledge, nobody has expressed interest in running ap_verify
on individual visits as they get added to a repository, rather than as batch data sets.
Design Notes
Metrics Sources
Some of the data sources will be ap_verify
's responsibility, others may be provided by ap_pipe
. In the latter case, the relevant data must be communicated from ap_pipe
to ap_verify
.
- Persisted products can be retrieved from output repositories using the Butler. Since standalone tasks are expected to accept repositories as input instead of creating them,
ap_verify
chooses the repository URIs (and therefore knows which ones to query). However, the current Butler interface makes it difficult to make queries against "all" dataIds, which is normally what's of interest toap_verify
. - The database location may be something specified by
ap_verify
(as part of the "environment" for a pipeline run), or an implementation detail ofap_pipe
. In the latter case,ap_pipe
must giveap_verify
some kind of handle for accessing the database; a prototype for this role is available in the current system aslsst.ap.association.AssociationDBSqliteTask
. - Jobs are currently persisted independently of the butler, and Eric/Krzysztof have hacked around this by asking that tasks report the location of persisted Jobs using a standardized metadata key. In the production system, any Jobs with task-specific metrics may be accessible via the Butler.
- Currently,
ap_pipe
reports metadata toap_verify
after eachCmdLineTask
is run. Metadata can in principle be extracted using the Butler, but this requires knowledge of the specific tasks called byap_pipe
, as well as implementation details of those tasks.
Template Management
To ensure that ap_verify
always uses templates compatible with the pipeline code, we may want ap_verify to be responsible for template generation at run time. An alternative is versioning datasets according to which Stack versions they are compatible with (see discussion on
-
DM-12853Getting issue details...
STATUS
).