Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1. I'm saying "file" rather than "Dataset" here (and below) both to provide a clearer example and because I think concrete composite vs. immediate virtual composites is how we'll want to implement single-file Exposure vs. multi-file Exposure.  This design doesn't actually guarantee that a Datastore will write a concrete composite as a single file (after all, it could even write some/all of it to a SQL database instead), but Datastores that actually do write files shouldn't need to be able to split up concrete composites into multiple files themselves.  However, the design does guarantee that an immediate virtual composite will not be written as a single file.

Simplified Exposure as an Example

I'll use the Exposure StorageClass for most examples.  For the purposes of this design, we'll assume it has the following simplified definition (pseudocode; not actual APIs):

Code Block
Wcs = StorageClass(...)

Image = StorageClass(...)


MaskedImage = StorageClass(
    components={
        "image": Image,
        "variance": Image
    },
    ...
)

Exposure = StorageClass(
    components={
        "maskedImage": MaskedImage,
        "wcs": Wcs,
        "image": "maskedImage.image",        # aliases to more deeply-nested components
        "variance": "maskedImage.variance"
    },
    ...
)


Principles

  • All Datasets have an entry in the Registry's Dataset table.  This implies that a composite Dataset is more than the sum of its components: it also includes (Registry) information to associate them.
  • Any provenance graph (both what's recording after processing and the QuantumGraph produced by Preflight) must contain nodes for both composite and components, because:
    • a SuperTask may consume only some components of a composite, so all component nodes must be in the graph;
    • a deferred virtual composite must be created explicitly, so it cannot be considered implicit in the graph;
    • whether a particular composite DatasetType is defined as concrete, virtual, immediate, or deferred is considered hidden from SuperTasks, so we cannot include just some composite nodes in the graph.
  • All information needed to read a Dataset is saved at the level of the Dataset (in some combination of Registry or Datastore).  No information necessary for reading a Dataset is stored at a Datastore-wide or Registry-wide level, and it should never be necessary to configure a Butler a certain way in order to read something.
  • It must be possible to change whether a particular DatasetType is written as concrete, virtual, immediate, or deferred by changing only the Butler/Datastore configuration provided when initializing a client.
    • As a result, any composite StorageClass must be writeable as concrete (with virtual components), immediate virtual, or deferred virtual; the StorageClass itself shall not be specialized to one of these choices.
    • No Registry content may be changed when controlling how a composite DatasetType is written.

Permitted Combinations

  • A virtual component must be a part of exactly one concrete composite.  For example, if Exposure A and Exposure B are each written as a single file, then A.wcs cannot also be a component of B.
  • A virtual component may be a part of one or more deferred virtual composites.  For example, if Exposure A and Exposure B are each written as a single file, then an Exposure C may be defined such that C.wcs = A.wcs and C.maskedImage = B.maskedImage.

Configuration

Registry

When a DatasetType with a composite StorageClass is declared to a Registry, DatasetTypes for each of the named components are also declared, with names constructed as "{parent-dataset-type-name}.{component-name}", the same DataUnits types as the parent, and StorageClasses defined by the the parent StorageClass.

For example (pseudocode):

Code Block
# Given:
ImageMaskedImage = StorageClass(...)


Wcs = StorageClass(...)


Exposure = StorageClass(
    components={
        "image": Image,
        "wcs": Wcs,
    },
    ...
)


# then this line:
registry.registerDatasetType(CalExp=DatasetType(StorageClass=Exposure, DataUnits=("Visit", "Sensor")))


# Effectively also does:
# registry.registerDatasetType(CalExp.image=DatasetType(StorageClass=ImageMaskedImage, DataUnits=("Visit", "Sensor")))
# registry.registerDatasetType(CalExp.wcs=DatasetType(StorageClass=Wcs, DataUnits=("Visit", "Sensor")))

As we'll see below, these component DatasetTypes will be used by virtual components of concrete composites and concrete components of immediate virtual composites, but will not be used for components of deferred virtual composites (because those will have already been written and added to the Registry using some other DatasetType).

Datastore/Butler

...

To configure a composite X to be written as a concrete composite with virtual components:

Writing Datasets

todo

Reading Datasets

...