Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
datasets: {
    calexp: {
        template:      "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:        "lsst.afw.image.ExposureF"
        persistable:   "ExposureF"
        storage:       "FitsStorage"
        level:         "None"
        tables:        raw
        tables:        raw_skyTile
    }
	# Type 2 getter for wcs in a Type 1 calexp
    calexp_wcs: {
        template:    "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:      "lsst.afw.image.Wcs" 
        persistable: "Wsc"
        storage:     "FitsStorage"
    }
	# Type 2 getter for calib in a Type 1 calexp
    calexp_calib: {
        template:    "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:      "lsst.afw.image.Calib"
        persistable: "ignored"
        storage:     "FitsStorage"
    }
	# Type 1 datasets for components of the Type 3 jointcalexp
	joint_wcs: {
		template:    "wcs/.../filename.fits"
		python:      "lsst.afw.image.Wcs"
		persistable: "Wcs"
		storage:     "FitsStorage"
	}
	joint_calib: {
		template:    "wcs/.../filename.fits"
		python:      "lsst.afw.image.Calib"
		persistable: "Calib"
		storage:     "FitsStorage"
    }
    # Type 3 dataset definition for jointcalexp
    jointcalexp: {
        template:    "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:      "lsst.afw.image.ExposureF"
        composite: { 
            calexp: {
                datasetType: "calexp"
                inputOnly: True # is this a good var name? meaning is do not write when deserializing.
            }
            wcs: {
                datasetType: "jointcal_wcs"
            }
            calib: {
                datasetType: "jointcal_calib"
            }
        }
        assembler: "lsst.mypackage.jointcal.JointcalAssembler"
        disassembler: "lsst.mypackage.jointcal.JointcalDisassembler"
    }
}

Assembler and Disassembler for jointcalexp are written and saved as specified by the policy. Assembler/Disassembler API is still TBD. Proposing:

I think it will work well for the code that will call the dis/assembler to get each type 1 dataset (butler should do this recursively by recursively getting Butler will get objects specified by components of type 2 and type 3 datasets ), and pass the group of those in a dict to the assembler with the component item key as the dict key. A class object of type indicated by the python field of the policy will be passed to the Assembler. If the object to return is a type 1 member of the composite, the Assembler may ignore the class object. Or, the assembler may create an instance via the class object and populate it with components from the component dict.

Code Block
def 
Code Block
def JointcalAssembler(dataId, componentDict, classObj):
    # expects componentDict keys 'calexp', 'wcs' and 'calib', to contain Exposure, Wcs and Calib objects, respectively.
    # in this case, load the 'base' object, and then overlay components
    exposure = componentDict['calexp']
    exposure.setWcs(componentDict['wcs'])
    exposure.setCalib(componentDict['calib'])
    return exposure
 

Calling butler.put to serialize the Exposure into the repository will decompose the Exposure according to via the disassembler.

This example shows only the updated wcs and calib being put into the butler's output repository. If they were the only items updated then the rest of the exposure should not need to be written; its data is unchanged from where it was located in the single_visit_processing repository.

Code Block
def JointcalDisassembler(exposure, dataId, componentDict):
    componentDict['calexp'] = exposureNone
    componentDict['wcs'] = exposure.getWcs()
    componentDict['calib'] = exposure.getCalib() 

Questions:

  • What about putting a cached wcs

...

  • (if the same wcs is shared among many exposures)? Should it be that the Disassembler puts a reference to the exposure in the componentDict, and when that type 1 dataset is to be written, butler will do a write-once-compare-same (requires redundant writes, but will throw away the 'same' file) operation? Or, it might work for the Butler cache to record that the object has already been written for a given dataset type + dataId, and skip the put if it knows it's been written.
  • Do we want to require that all the datasets that should be in the component dict (as indicated by the policy) are there? We could do one of:
    • If a dataset type is not in the component dict, raise. Allow None to indicate that the dataset is input-only on this object type.
    • Silently ignore missing dataset types.

Conversation

Component Lookup by Processing Stage

Unknown User (npease) asked:

If new Wcs and Calib component datasets are written, but other component datasets are not replaced: is it important (or possible?) to specify which component datasets should be from calexp.jointcal and which are ok to have come from the single-visit calexp? I'm currently imagining a lookup algorithm for components:

Code Block
languagepy
butler.get(datasetType='calexp:jointcal', ...)
    for each component in composite:
        location = butler.map(datasetType, dataId)
        if location:
			# I figure it will find locations for 'wcs' and 'calib'
		else:
            # There are lots of other possible components in 

Conversation

Component Lookup by Processing Stage

Unknown User (npease) asked:

If new Wcs and Calib component datasets are written, but other component datasets are not replaced: is it important (or possible?) to specify which component datasets should be from calexp.jointcal and which are ok to have come from the single-visit calexp? I'm currently imagining a lookup algorithm for components:

Code Block
languagepy
butler.get(datasetType='calexp:jointcal', ...)
    for each component in composite:
        location = butler.map(datasetType, dataId)
        if location:
			# I figure it will find locations for 'wcs' and 'calib'
		else:
            # There are lots of other possible components in MaskedImage 
			# and ExposureInfo (psf, detector, validPolygon, filter, 
			# coaddInputs, etc) What to do?

...

Code Block
    calexp_psf: {
        template:    "<psf template>"
        python:      "lsst.afw.detection.Psf"
        persistable: "Psf"
        storage:     "FitsStorage"
    }
 
    calexp: {
        template:    "sci-results/%(run)d/%(camcol)d/%(sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:      "lsst.afw.imagedetection.ExposureFPsf"
        persistable: "ExposureFPsf"
        storage:     "FitsStorage"
    }
 
    compositecalexp: {
        python:      "lsst.afw.image.ExposureF"
        storage:     "FitsStorage"
        composite: { 
            psf: {
                datasetType: "calexp_psf"
            }
            calib: }
    }{
                datasetType: "calib_psf" # (policy for this dataset type is not shown)
            }
        }
        assembler: "lsst.mypackage.CalexpAssembler"
        disassembler: "lsst.mypackage.CalexpDisassembler"
    }

The user calls butler.get('calexp.psf', dataId={...}The user calls butler.get('calexp.psf', ...). Butler looks up the dataset type definition for calexp, and in composite finds its component dataset type psf, which refers to the Type 1 dataset type calexp_psf. Butler gets the Psf object according to that dataset type definition and returns the object.

...

Code Block
    calexp: {
        template:    "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:      "lsst.afw.image.ExposureF"
        persistable: "ExposureF"
        storage:     "FitsStorage"
    }


Question

I can't think of how it's possible in a reliable way to get the Psf that an Exposure "would have" loaded (or any member object that a class would have instantiated in that class's deserializer) without actually running the class's deserializer. I do think we could instantiate the Exposure, infer that the name of the getter is getPsf, and return the result of that get operation. But this does not save us from having to instantiate an entire exposure class. Is this worth it, or is it better to just require that component loading requires a Type 2 or Type 3 dataset definition?

...

Use a policy that describes a dataset type that describes the composite dataset type:

One way to do it is to write an assembler that adds the Wcs and Calib to the SourceCatalog object.

Code Block
 datasets: {
    icSrc: {
        template:      "sci-results/%(run)d/%(camcol)d/%(filter)s/icSrc/icSrc-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:        "lsst.afw.table.SourceCatalog"
        persistable:   "ignored"
        storage:       "FitsCatalogStorage"
        tables:        raw
        tables:        raw_skyTile
    }
	wcs: {
		template:    "wcs/.../filename.fits"
		python:      "lsst.afw.image.Wcs"
		persistable: "Wcs"
		storage:     "FitsStorage"
	}
	
	calib: {
		template:    "wcs/.../filename.fits"
		python:      "lsst.afw.image.Wcs"
		persistable: "Calib"
		storage:     "FitsStorage"
    }
    extended_icSrc: {
Code Block
 datasets: {
    icSrc: {
        template:      "sci-results/%(run)d/%(camcol)d/%(filter)s/icSrc/icSrc-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
        python:        python: "lsst.afw.table.SourceCatalog"
        persistablecomposite:   "ignored"{
            icSrc: {
        storage:        datasetType: "FitsCatalogStorageicSrc"
        tables:    }
    raw
        tableswcs: {
       raw_skyTile
     }
	wcs: {
		template:   datasettype: "wcs/.../filename.fits"
		python:"
            }
            "lsst.afw.image.Wcs"
		persistablecalib: "Wcs"
		storage:{
     "FitsStorage"
	}
	
	calib: {
		template:    "wcs/.../filename.fits"
		python:      "lsst.afw.image.Wcs"
		persistablecalib: "Calibcalib"
		storage:            "FitsStorage"
}
        }
        assembler: "lsst.mypackage.extended_icSrc: {_assembler"
        pythondisassembler: "lsst.afw.table.SourceCatalogmypackage.extended_icSrc_disassembler"
    }
}

The assembler and disassembler could like this & do monkey patching the SourceCatalog

Code Block
def extended_icSrc_assembler(dataId, componentDict,  compositeclassObj): {
    # in this case, load the 'base' object, icSrc:and {
then add in components as a monkey patch
    srcCat     datasetType: "icSrc"= componentDict['icSrc']
    srcCat.wcs = componentDict['wcs']
      }
  srcCat.calib = componentDict['calib']
    return srcCat
 
def extended_icSrc_disassembler(srcCat, dataId, wcscomponentDict): {
    componentDict['icSrc'] = srcCat
    componentDict['wsc'] = srcCat.wsc
    datasettype: "wcs"
  componentDict['calib'] = srcCat.calib
 

Or if the SourceCatalog class was extended to have setters & getters then the setters could be called by the assembler and the getters by the disassembler.

Code Block
def extended_icSrc_assembler(dataId, componentDict, classObj):
    # in this case, load the }
            calib: {'base' object, and then add in components as a monkey patch
    srcCat = componentDict['icSrc']
    srcCat.setWcs(componentDict['wcs'])
      calib: "calib"srcCat.setCalib(componentDict['calib'])
    return srcCat
 
def extended_icSrc_disassembler(srcCat,     }dataId, componentDict):
        }
  componentDict['icSrc'] = srcCat
    componentDict['wsc'] = assembler: "lsst.mypackage.extended_icSrc_assembler"srcCat.getWsc()
    componentDict['calib'] =   disassembler: "lsst.mypackage.extended_icSrc_disassembler"
    }
}

The assembler and disassembler could like this & do monkey patching

srcCat.getCalib()

By Pure-Composite Container Class

Another way would be to write a python type that contains source catalog, Wcs, and Calib and use the generic assembler.

Code Block
class SourceCatalogWithInfo:
    def __init__(self
Code Block
def extended_icSrc_assembler(dataId, componentDict, classObj):
    # in this case, load the 'base' object, and then add in components as a monkey patch  """no-op constructor"""
        pass 
    srcCat = componentDict['icSrc']
    srcCat.wcs# = componentDict['wcs']
    srcCat.calib = componentDict['calib']set all the component data individually:
    return srcCat
 
def extended_icSrc_disassembler(srcCat, dataIdsetSourceCatalog(self, componentDictsourceCatalog):
        componentDict['icSrc']self.sourceCatalog = srcCatsourceCatalog
    componentDict['wsc'] = srcCat.wscdef getSourceCatalog(self):
    componentDict['calib'] = srcCat.calib
 

Or if the SourceCatalog class was extended to have setters & getters then the setters could be called by the assembler and the getters by the disassembler.

By Pure-Composite Container Class

Another way would be to write a python type that contains source catalog, Wcs, and Calib and use the generic assembler.

Code Block
class SourceCatalogWithInfo:
    def __init__(self   return self.sourceCatalog
    def setWcs(wcs):
        """no-op constructor""" self.wcs = wcs
    def getWcs(wcs):
        passreturn self.wcs
    def setCalib(calib):
    # set all the componentself.calib data= individually:calib
    def setSourceCatalog(self, sourceCataloggetCalib(calib):
        return self.sourceCatalog = sourceCatalog
    def getSourceCatalog(selfcalib

Assembler & Disassembler:

Code Block
def extended_icSrc_assembler(dataId, componentDict, classObj):
    # in this case, load return self.sourceCatalog
    def setWcs(wcs):
        self.wcs = wcsthe 'base' object, and then add in components as a monkey patch
    srcCatEx = classObj()
	srcCatEx.setSourceCatalog(componentDict['icSrc'])
    def getWcs(wcs):srcCatEx.setWcs(componentDict['wcs'])
    srcCatEx.setCalib(componentDict['calib'])
    return self.wcssrcCatEx
 
def   def setCalib(calibextended_icSrc_disassembler(srcCatEx, dataId, componentDict):
    componentDict['icSrc'] =   self.calib = calib
    def getCalib(calib):srcCatEx.getSourceCatalog()
    componentDict['wsc'] = srcCatEx.getWsc()
    componentDict['calib'] =   return self.calib
srcCatEx.getCalib()

Modify the policy to use the new type

...

Pseudocode

Note, Jim says "We haven't yet implemented attaching background objects to Exposures yet, but it's been our to-do list for a long time. Currently our background class is pure-Python."

The background class is lsst.afw.math.BackgroundList

...

Code Block
def ExposureWithBackgroundAssembler(dataId, componentDict, classObj):
    exposure = componentDict['exposure']
    exposure.setBackground(componentDict['background'])
    return exposure
# do we actually need to pass in a componentDict; is it likely to pre-contain data that the disassembler will need? return exposure
def ExposureWithBackgroundDisassembler(obj, dataId, componentDict):
    componentDict['exposure'] = obj.exposure
    componentDict['background'] = obj.background
    return componentDict

Use:

Code Block
#Create a butler with the pre-detection data as inputs and will put the post-detection data for output
import lsst.daf.persistence as dafPersist
butler = dafPersist.Butler(input="datasets/A", output="datasets/B")
dataId = {...}
exp = butler.get("coadd_detection", dataId)
exp.runDetectionProcessing()
# this put will write both the background and the exposure to the repository at "datasets/B"
butler.put("coadd_detection", dataId)
# later...
butler = dafPersist.Butler(input="datasets/B", outputs="datasets/C")
exp = butler.get("coadd_postdetect", dataId)
# do not modify the exposure object - it will not get written!)    
exp.performOperationsThatDoNotChangeTheExposure()
# this will write the background to the repo at "datasets/C" but not the exposure.
butler.put("coadd_postdetect", dataId)

...

5. Composite Object Tables

...

Code Block
def srccat_bg_assembler(dataId, componentDict, classObj):
    # my understanding, per the example, is that there is no function to combine these datasets in the way
    # that the example wants to do it, so we'll just use a 'fake' function called 'doAssembly'.
    # TODO where does the output (composite) object originate? In some cases it may start as a component, if it has     
    # to be loaded as a type 1 and then appended/overwritten with components.
    obj = classObj()
    obj = doAssembly(obj, componentDict['blue'], componentDict['green']
    return obj

...

  • Composable and Decomposable & Pluggable

    • Use: for the dataset name specified for the aggregate case there will be a plugin that knows how to combine (or knows how to call the constructor in a way that will cause it to combine) the datasets into a single aggregate object.

Pseudocode

?? Question: is there a case where the aggregate composite should be persisted as a whole? Probably it would get written directly as a type 1 dataset? Or there would be a a disassembler (not shown) that knows how to break apart the aggregate deepCoadd and pass it back to be persisted as deepCoadd_src

...

Code Block
from lsst.daf.persistence import dafPersist
butler = dafPersist.butler(inputs="my/input/dir")
dataId {'filter':g} # note this leaves out tract and patch
sourceCat = butler.get("deepCoadd_src_aggregate", dataId)
# ...use the sourceCat

...

7. Access to Precursor Datasets (via Persisted DataId)

...