You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

This page has been completely rewritten from its original form, and the terminology has changed.  In particular, because we've settled on an approach in which all datasets always have an entry in the Registry's Dataset table, we have repurposed "virtual" to mean something different.

Terminology

composite: a Dataset or DatasetType whose StorageClass defines a set of discrete named child datasets, called components

parent: synonym for composite

component: a Dataset or DatasetType that may be accessed as a child of a composite (in some cases may also be accessed in other ways)

child: synonym for component

virtual: a Dataset or DatasetType that is defined by its relationship to one or more other Datasets/DatasetTypes.

concrete: not virtual

immediate: all content is written in a single call to Butler.put

deferred: content is written via multiple calls to Butler.put and associated later via a call to Butler.link.

This leads to four fundamental kinds of Dataset[Type]s:

  • concrete (always immediate, may be a component or a composite or neither): this is a regular Dataset that is itself written by a single Datastore.put call and read by a single Datastore.get call.  Writing a concrete composite also creates virtual components.  Examples: a WCS written on its own, an Exposure written all at once into a single file1, or a WCS written to its own file when writing an Exposure to multiple files.
  • virtual component (always immediate): a Dataset defined automatically when its parent Dataset is written.  Example: the WCS of an Exposure written at at once into a single file1.
  • immediate virtual composite: a Dataset written at once by recursively writing its components.  Example: an Exposure written by writing all of its components into separate files (which could include writing an immediate virtual composite MaskedImage that in turn writes concrete Image, Mask and Variance).
  • deferred virtual composite: a Dataset defined by associating preexisting Datasets that are interpreted as its components.  Example: an Exposure written by combining MaskedImage and PSF from a different Exposure with a WCS and PhotoCalib written originally as separate files. 

By design, the distinction between virtual and concrete is meaningful for both get and put, but the distinction between immediate and deferred is meaningful only for put.

1. I'm saying "file" rather than "Dataset" here both to provide a clearer example and because I think concrete composite vs. immediate virtual composites is how we'll want to implement single-file Exposure vs. multi-file Exposure.  This design doesn't actually guarantee that a Datastore will write a concrete composite as a single file (after all, it could even write some/all of it to a SQL database instead), but Datastores that actually do write files shouldn't need to be able to split up concrete composites into multiple files themselves.  However, the design does guarantee that an immediate virtual composite will not be written as a single file.

Principles

  • All Datasets have an entry in the Registry's Dataset table.  This implies that a composite Dataset is more than the sum of its components: it also includes (Registry) information to associate them.
  • Any provenance graph (both what's recording after processing and the QuantumGraph produced by Preflight) must contain nodes for both composite and components, because:
    • a SuperTask may consume only some components of a composite, so all component nodes must be in the graph;
    • a deferred virtual composite must be created explicitly, so it cannot be considered implicit in the graph;
    • whether a particular composite DatasetType is defined as concrete, virtual, immediate, or deferred is considered hidden from SuperTasks, so we cannot include just some composite nodes in the graph.
  • All information needed to read a Dataset is saved at the level of the Dataset (in some combination of Registry or Datastore).  No information necessary for reading a Dataset is stored at a Datastore-wide or Registry-wide level, and it should never be necessary to configure a Butler a certain way in order to read something.
  • It must be possible to change whether a particular DatasetType is written as concrete, virtual, immediate, or deferred by changing only the Butler/Datastore configuration provided when initializing a client.
    • As a result, any composite StorageClass must be writeable as concrete (with virtual components), immediate virtual, or deferred virtual; the StorageClass itself shall not be specialized to one of these choices.
    • No Registry content may be changed when controlling how a composite DatasetType is written.

Configuration

Registry

When a DatasetType with a composite StorageClass is declared to a Registry, DatasetTypes for each of the named components are also declared, with names constructed as "{parent-dataset-type-name}.{component-name}", the same DataUnits types as the parent, and StorageClasses defined by the the parent StorageClass.

For example (pseudocode):

# Given:
Image = StorageClass(...)


Wcs = StorageClass(...)


Exposure = StorageClass(
    components={
        "image": Image,
        "wcs": Wcs,
    },
    ...
)


# then this line:
registry.registerDatasetType(CalExp=DatasetType(StorageClass=Exposure, DataUnits=("Visit", "Sensor")))


# Effectively also does:
# registry.registerDatasetType(CalExp.image=DatasetType(StorageClass=Image, DataUnits=("Visit", "Sensor")))
# registry.registerDatasetType(CalExp.wcs=DatasetType(StorageClass=Wcs, DataUnits=("Visit", "Sensor")))

As we'll see below, these component DatasetTypes will be used by virtual components of concrete composites and concrete components of immediate virtual composites, but will not be used for components of deferred virtual composites (because those will have already been written and added to the Registry using some other DatasetType).

Datastore/Butler

todo

Writing Datasets

todo

Reading Datasets

todo


  • No labels