Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagepy
Wcs = StorageClass(...)

Image = StorageClass(...)

MaskedImage = StorageClass(
    components={
        "image": Image,
        "variance": Image
    },
    ...
)

Exposure = StorageClass(
    components={
        "maskedImage": MaskedImage,
        "wcs": Wcs,
        "image": "maskedImage.image",        # aliases to more deeply-nested components
        "variance": "maskedImage.variance"
    },
    ...
)

Terminology

composite: a Dataset or DatasetType whose StorageClass defines a set of discrete named child datasets, called components

...

  1. A virtual component must be a part of exactly one concrete composite.  For example, if Exposure A and Exposure B are each written as a single file, then A.wcs cannot also be a component of B.
  2. A virtual component may be a part of one or more deferred virtual composites.  For example, if Exposure A and Exposure B are each written as a single file, then an Exposure C may be defined such that C.wcs = A.wcs and C.maskedImage = B.maskedImage.
  3. A virtual component may be a part of at most one immediate virtual composite if and only if its concrete parent composite is also , but only indirectly: it must be a component of a concrete composite that is in turn a component of that the immediate virtual composite.  For example, if Exposure D is a virtual immediate composite, and its maskedImage is concrete, writing D writes D.maskedImage to a single file (probably1), and then associates the "image" virtual component creating D.maskedImage.image with D as D.image., a virtual component.
  4. A concrete composite always contains virtual components.  For example, writing an Exposure A as a single file always implies that a A.wcs is a valid dataset (though it may be permitted to be None/null).
  5. A concrete composite may not contain concrete components.
  6. A concrete composite may not contain virtual composites.
  7. An immediate virtual composite may contain one or more concrete components.  For example, if Exposure D is an immediate virtual composite, its maskedImage and wcs components will be written (when D is put) as separate files.
  8. An immediate virtual composite may contain one or more virtual components as long as it also contains their concrete composites , but only indirectly (this is a restatmenet restatement of (3)).
  9. An immediate virtual composite may contain other immediate virtual composites.  For example, if Exposure E is an immediate virtual composite, its maskedImage component may also be an immediate virtual composite, which means that the E.maskedImage.image and E.maskedImage.variance will each be written as a distinct concrete datasets (i.e. separate files) when E is put.
  10. An immediate virtual composite may not contain deferred virtual composites.
  11. A deferred virtual composite may contain one or more concrete components.  For example, we could write a stand-alone Wcs F, then later define an Exposure G such that G.wcs is F.
  12. A deferred virtual composite may contain one or more virtual components.  Those virtual components must still have concrete composite parents, but those concrete composite parents need not be children of the deferred virtual composite.  This is a restatement of (2).
  13. A deferred virtual composite may contain one or more immediate virtual composites.  For example, we could write a MaskedImage H as an intermediate virtual composite, resulting in F.image and F.variance being written as separate files.  We could then define an Exposure J such that J.maskedImage is H.
  14. A deferred virtual composite may contain one or more other deferred virtual composites.  For example, we could write two Images K and L, then define a MaskedImage M such that M.image=K and M.variance=L, and then define an Exposure N such that N.maskedImage=M (and N.image=K and N.variance=L).

...

Code Block
languagepy
# Given the StorageClasses defined above, this line:
registry.registerDatasetType(CalExp=DatasetType(StorageClass=Exposure, DataUnits=("Visit", "Sensor")))
# implies:
# registry.registerDatasetType(CalExp.wcs=DatasetType(StorageClass=Wcs, DataUnits=("Visit", "Sensor")))
# registry.registerDatasetType(CalExp.maskedImage=DatasetType(StorageClass=MaskedImage, DataUnits=("Visit", "Sensor")))
# registry.registerDatasetType(CalExp.image=DatasetType(StorageClass=Image, DataUnits=("Visit", "Sensor")))
# registry.registerDatasetType(CalExp.variance=DatasetType(StorageClass=Variance, DataUnits=("Visit", "Sensor")))

As we'll As we'll see below, these component DatasetTypes will be used by virtual components of concrete composites and concrete components of immediate virtual composites, but will not be used for components of deferred virtual composites (because those will have already been written and added to the Registry using some other DatasetType).

...

Code Block
languagepy
def Butler.put(self, obj, datasetType, dataId, producer=None):
    """Write a dataset.

    May not be a virtual component or a deferred virtual composite.
    """
    datasetType = self.registry.getDatasetType(datasetType)  # argument may have just been a string; now it's an object
    ref = self.registry.addDataset(datasetType, dataId, run=self.run, producer=producer)
    disassembler = self.config.getDisassembler(datasetType)
    if disassembler is not None:  # this is an immediate virtual composite
        childObjs = disassembler(obj)
        for childName, childDatasetType in datasetType.components.items():
            if self.config.getWriteFormatter(childDatasetType):  # not a virtual component
                childRef = self.put(childObj, childDatasetType, dataId, producer=producer)
            self.registry.attachComponent(parent=ref, child=childRef)
        self.registry.setAssembler(ref, self.config.getAssembler(datasetType))  # Could also consider putting this in Datastore
    else:   # this is concrete (and maybe a composite)
        for childName, childDatasetType in datasetType.components.items():  # if not a composite, loop body is never executed
            childRef = self.registry.addDataset(childDatasetType, dataId, run=self.run, producer=producer)
		    self.registry.attachComponent(parent=ref, child=childRef)
            self.datastore.addReader(childRef, self.config.getReadFormatter(childDatasetType))
        self.datastore.put(obj, ref)
    return ref

def Butler.link(self, datasetType, childRefs, dataId, producer=None):
    """Create a deferred virtual composite dataset by associating existing datasets.

    There are two link overloads; this one is more powerful but less convenient in the common case.
    """
    ref = self.registry.addDataset(datasetType, dataId, run=self.run, producer=producer)
    for childRef in childRefs:
        self.registry.attachComponent(ref, childRef)
    self.registry.setAssembler(ref, self.config.getAssembler(datasetType))

def Butler.link(self, datasetType, childDatasetTypes, dataId, producer=None):
    """Create a deferred virtual composite dataset by associating existing datasets.

    There are two link overloads; this one is less powerful but more convenient in the common case.
    """
    # Look up the DatasetRefs using the given DataID and then call the other overload.
	self.link(datasetType,
              [self.registry.find(childDatasetType, dataId) for childDatasetType in childDatasetTypes],
              dataId)


def Registry.addDataset(self, datasetType, dataId, run, producer=None):
    dataset_id, registry_id = self.execute("INSERT INTO Dataset ...")
    return DatasetRef(datasetType, dataId, dataset_id, registry_id, ...)

def Registry.attachComponent(self, parent, child):
    self.execute("INSERT INTO DatasetComposition (parent_dataset_id, parent_registry_id, component_name, child_dataset_id, child_registry_id) ...")

def Registry.setAssembler(self, ref, assembler):
    self.execute("UPDATE Dataset SET assembler=? WHERE dataset_id=? AND registry_id=?", assembler.name, ref.dataset_id, ref.registry_id)


def Datastore.put(self, obj, ref):
    # ... actually write a file or something ...
    self.registry.execute("INSERT INTO Storage (dataset_id, registry_id, datastore_name, md4, size) ...")
    self.addReader(ref, self.config.getReadFormatter(ref.datasetType))

def Datastore.addReader(self, ref, formatter):
    # ... record somewhere that we should use the given formatter when asked to read back ref ...

...