ButlerWG Meeting 2017-10-24

Meeting began 24 Oct 2017 @ 11am; ended 1:05pm.

Next meeting: 26 Oct 2017 @ 10am. Simon Krughoff will walk through a use case in the design, and we will continue with Michelle Gower's review.

Notes are extremely condensed because Jim Bosch vastly overestimated his ability to talk and take notes at the same time.

Discussed various design points and clarity issues brought up by Michelle Gower in her review of DMTN-056:

Responsibilities of Datastore vs. StorageClass:
- What sets the file format? Datastore does, with a dispatch on the StorageClass and the DatasetType.
- Can the file format used for a StorageClass/DatasetType combination change within a Datastore? Yes, but this means the Datastore has to record the file format used somehow; we can't put that information back in the Registry.
- Why is Datastore responsible for "disassembling" composites while StorageClass has "assemble"? This division of responsibility, while potentially confusing, seemed to provide the most flexibility for Datastores and Registries while still meeting our needs. Assembly and disassembly are not as symmetric as they seem, because users can demand composites be assembled but (usually) don't care if they're stored as separate files or not.
- Unknown User (pschella) to expand StorageClass description to emphasize that Datastore.put has to return URIs for all components defined by a StorageClass, regardless of whether the the Dataset is stored as multiple files.
Why do we record output provenance within Butler.put, but use Butler.markInputUsed for input provenance? We cannot know whether we'll actually use a dataset when we call get, but we do know we're writing it when we call put. Making it a single call for put allows provenance and storage to be atomic.

Jim Bosch to follow up with Michelle Gower on whether or not we should separate actual vs. predicted outputs (or have some other form of "optional" outputs) in our provenance tables, based on DESDM experience.

How do limited Registries support running individual SuperTasks with manually-provided inputs?
- Requires custom activator code for each SuperTask we want to support this way.
- Could move that code inside the concrete SuperTask, but need to make sure it still delegates to the code path that uses regular SuperTask entry points.
- Still need to work out what the limited Registry would need to do for that.
Are SuperTasks allowed to do metadata queries against the Registry in runQuantum or defineQuanta? They should not need to; all the metadata we think they'll need should be in the DataUnits they are given (as that defines in part what we put in DataUnits).

Jim Bosch and Unknown User (pschella) to expand example query in SuperTask description to include example query results, QuantumGraph, etc.
Jim Bosch to address To-Do at the top of SuperTask Pre-Flight section and expand it to include registering and retrieving DatasetTypes.

Space shortcuts