...
- All Datasets have an entry in the Registry's Dataset table. This implies that a composite Dataset is more than the sum of its components: it also includes (Registry) information to associate them.
- Any provenance graph (both what's recorded after processing and the QuantumGraph produced by Preflight) must contain nodes for both composite and components, because:
- a SuperTask may consume only some components of a composite, so all component nodes must be in the graph;
- a deferred virtual composite must be created explicitly, so it cannot be considered implicit in the graph;
- whether a particular composite DatasetType is defined as concrete, virtual, immediate, or deferred should not affect what procesing is done, so we cannot include just some composite nodes in the graph.
- All information needed to read a Dataset is saved at the level of the Dataset (in some combination of Registry or Datastore). No information necessary for reading a Dataset is stored at a Datastore-wide or Registry-wide level, and it should never be necessary to configure a Butler a certain way in order to read something.
- It must be possible to change whether a particular DatasetType is written as a concrete , virtual, immediate, or deferred composite or immediate virtual composite by changing only the Butler/Datastore configuration provided when initializing a client. The same is true for configuring a Dataset to be a deferred virtual composite, though of course this also requires Butler users (e.g. SuperTasks) to use
link
as well asput
to create it.- As a result, any composite StorageClass must be writeable as concrete (with virtual components), immediate virtual, or deferred virtual; the StorageClass itself shall not be specialized to one of these choices.
- It should not be necessary to change persistent Registry or Datastore state (e.g. database tables or config files stored with the Datastore) in order to change how a composite DatasetType is written (it should be sufficient to change Butler/Datastore client information, though of course some configurations may be rejected by certain Datastores, and Datastores may provide persistent defaults for that configuration).
...