...
Code Block |
---|
datasets: {
calexp: {
template: "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
python: "lsst.afw.image.ExposureF"
persistable: "ExposureF"
storage: "FitsStorage"
level: "None"
tables: raw
tables: raw_skyTile
}
# Type 2 getter for wcs in a Type 1 calexp
calexp_wcs: {
template: "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
python: "lsst.afw.image.Wcs"
persistable: "Wsc"
storage: "FitsStorage"
}
# Type 2 getter for calib in a Type 1 calexp
calexp_calib: {
template: "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
python: "lsst.afw.image.Calib"
persistable: "ignored"
storage: "FitsStorage"
}
# Type 1 datasets for components of the Type 3 jointcalexp
joint_wcs: {
template: "wcs/.../filename.fits"
python: "lsst.afw.image.Wcs"
persistable: "Wcs"
storage: "FitsStorage"
}
joint_calib: {
template: "wcs/.../filename.fits"
python: "lsst.afw.image.Calib"
persistable: "Calib"
storage: "FitsStorage"
}
# Type 3 dataset definition for jointcalexp
jointcalexp: {
template: "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
python: "lsst.afw.image.ExposureF"
composite: {
calexp: {
datasetType: "calexp"
inputOnly: True # is this a good var name? meaning is do not write when deserializing.
}
wcs: {
datasetType: "jointcal_wcs"
}
calib: {
datasetType: "jointcal_calib"
}
}
assembler: "lsst.mypackage.jointcal.JointcalAssembler"
disassembler: "lsst.mypackage.jointcal.JointcalDisassembler"
}
} |
Assembler and Disassembler for jointcalexp
are written and saved as specified by the policy. Assembler/Disassembler API is still TBD. Proposing:
I think it will work well for the code that will call the dis/assembler to get each type 1 dataset (butler should do this recursively by recursively getting Butler will get objects specified by components of type 2 and type 3 datasets ), and pass the group of those in a dict to the assembler with the component item key as the dict key. A class object of type indicated by the python
field of the policy will be passed to the Assembler. If the object to return is a type 1 member of the composite, the Assembler may ignore the class object. Or, the assembler may create an instance via the class object and populate it with components from the component dict.
Code Block |
---|
def |
Code Block |
def JointcalAssembler(dataId, componentDict, classObj): # expects componentDict keys 'calexp', 'wcs' and 'calib', to contain Exposure, Wcs and Calib objects, respectively. # in this case, load the 'base' object, and then overlay components exposure = componentDict['calexp'] exposure.setWcs(componentDict['wcs']) exposure.setCalib(componentDict['calib']) return exposure |
Calling butler.put to serialize the Exposure into the repository will decompose the Exposure according to via the disassembler.
This example shows only the updated wcs
and calib
being put into the butler's output repository. If they were the only items updated then the rest of the exposure should not need to be written; its data is unchanged from where it was located in the single_visit_processing
repository.
Code Block |
---|
def JointcalDisassembler(exposure, dataId, componentDict): componentDict['calexp'] = exposureNone componentDict['wcs'] = exposure.getWcs() componentDict['calib'] = exposure.getCalib() |
Questions:
- What about putting a cached wcs
...
- (if the same wcs is shared among many exposures)? Should it be that the Disassembler puts a reference to the exposure in the componentDict, and when that type 1 dataset is to be written, butler will do a write-once-compare-same (requires redundant writes, but will throw away the 'same' file) operation? Or, it might work for the Butler cache to record that the object has already been written for a given dataset type + dataId, and skip the put if it knows it's been written.
- Do we want to require that all the datasets that should be in the component dict (as indicated by the policy) are there? We could do one of:
- If a dataset type is not in the component dict, raise. Allow None to indicate that the dataset is input-only on this object type.
- Silently ignore missing dataset types.
Conversation
Component Lookup by Processing Stage
Unknown User (npease) asked:
If new Wcs and Calib component datasets are written, but other component datasets are not replaced: is it important (or possible?) to specify which component datasets should be from calexp.jointcal
and which are ok to have come from the single-visit calexp
? I'm currently imagining a lookup algorithm for components:
Code Block | ||
---|---|---|
| ||
butler.get(datasetType='calexp:jointcal', ...)
for each component in composite:
location = butler.map(datasetType, dataId)
if location:
# I figure it will find locations for 'wcs' and 'calib'
else:
# There are lots of other possible components in |
Conversation
Component Lookup by Processing Stage
Unknown User (npease) asked:
If new Wcs and Calib component datasets are written, but other component datasets are not replaced: is it important (or possible?) to specify which component datasets should be from calexp.jointcal
and which are ok to have come from the single-visit calexp
? I'm currently imagining a lookup algorithm for components:
Code Block | ||
---|---|---|
| ||
butler.get(datasetType='calexp:jointcal', ...)
for each component in composite:
location = butler.map(datasetType, dataId)
if location:
# I figure it will find locations for 'wcs' and 'calib'
else:
# There are lots of other possible components in MaskedImage
# and ExposureInfo (psf, detector, validPolygon, filter,
# coaddInputs, etc) What to do? |
...
Code Block |
---|
calexp_psf: { template: "<psf template>" python: "lsst.afw.detection.Psf" persistable: "Psf" storage: "FitsStorage" } calexp: { template: "sci-results/%(run)d/%(camcol)d/%(sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits" python: "lsst.afw.imagedetection.ExposureFPsf" persistable: "ExposureFPsf" storage: "FitsStorage" } compositecalexp: { python: "lsst.afw.image.ExposureF" storage: "FitsStorage" composite: { psf: { datasetType: "calexp_psf" } calib: } }{ datasetType: "calib_psf" # (policy for this dataset type is not shown) } } assembler: "lsst.mypackage.CalexpAssembler" disassembler: "lsst.mypackage.CalexpDisassembler" } |
The user calls butler.get('calexp.psf', dataId={...}
The user calls butler.get('calexp.psf', ...)
. Butler looks up the dataset type definition for calexp
, and in composite
finds its component dataset type psf
, which refers to the Type 1 dataset type calexp_psf
. Butler gets the Psf
object according to that dataset type definition and returns the object.
...
Code Block |
---|
calexp: { template: "sci-results/%(run)d/%(camcol)d/%(filter)s/calexp/calexp-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits" python: "lsst.afw.image.ExposureF" persistable: "ExposureF" storage: "FitsStorage" } |
Question
I can't think of how it's possible in a reliable way to get the Psf that an Exposure "would have" loaded (or any member object that a class would have instantiated in that class's deserializer) without actually running the class's deserializer. I do think we could instantiate the Exposure, infer that the name of the getter is getPsf
, and return the result of that get operation. But this does not save us from having to instantiate an entire exposure class. Is this worth it, or is it better to just require that component loading requires a Type 2 or Type 3 dataset definition?
...
Use a policy that describes a dataset type that describes the composite dataset type:
One way to do it is to write an assembler that adds the Wcs
and Calib
to the SourceCatalog object.
Code Block |
---|
datasets: {
icSrc: {
template: "sci-results/%(run)d/%(camcol)d/%(filter)s/icSrc/icSrc-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"
python: "lsst.afw.table.SourceCatalog"
persistable: "ignored"
storage: "FitsCatalogStorage"
tables: raw
tables: raw_skyTile
}
wcs: {
template: "wcs/.../filename.fits"
python: "lsst.afw.image.Wcs"
persistable: "Wcs"
storage: "FitsStorage"
}
calib: {
template: "wcs/.../filename.fits"
python: "lsst.afw.image.Wcs"
persistable: "Calib"
storage: "FitsStorage"
}
extended_icSrc: {
|
Code Block |
datasets: { icSrc: { template: "sci-results/%(run)d/%(camcol)d/%(filter)s/icSrc/icSrc-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits" python: python: "lsst.afw.table.SourceCatalog" persistablecomposite: "ignored"{ icSrc: { storage: datasetType: "FitsCatalogStorageicSrc" tables: } raw tableswcs: { raw_skyTile } wcs: { template: datasettype: "wcs/.../filename.fits" python:" } "lsst.afw.image.Wcs" persistablecalib: "Wcs" storage:{ "FitsStorage" } calib: { template: "wcs/.../filename.fits" python: "lsst.afw.image.Wcs" persistablecalib: "Calibcalib" storage: "FitsStorage" } } assembler: "lsst.mypackage.extended_icSrc: {_assembler" pythondisassembler: "lsst.afw.table.SourceCatalogmypackage.extended_icSrc_disassembler" } } |
The assembler and disassembler could like this & do monkey patching the SourceCatalog
Code Block |
---|
def extended_icSrc_assembler(dataId, componentDict, compositeclassObj): { # in this case, load the 'base' object, icSrc:and { then add in components as a monkey patch srcCat datasetType: "icSrc"= componentDict['icSrc'] srcCat.wcs = componentDict['wcs'] } srcCat.calib = componentDict['calib'] return srcCat def extended_icSrc_disassembler(srcCat, dataId, wcscomponentDict): { componentDict['icSrc'] = srcCat componentDict['wsc'] = srcCat.wsc datasettype: "wcs" componentDict['calib'] = srcCat.calib |
Or if the SourceCatalog class was extended to have setters & getters then the setters could be called by the assembler and the getters by the disassembler.
Code Block |
---|
def extended_icSrc_assembler(dataId, componentDict, classObj): # in this case, load the } calib: {'base' object, and then add in components as a monkey patch srcCat = componentDict['icSrc'] srcCat.setWcs(componentDict['wcs']) calib: "calib"srcCat.setCalib(componentDict['calib']) return srcCat def extended_icSrc_disassembler(srcCat, }dataId, componentDict): } componentDict['icSrc'] = srcCat componentDict['wsc'] = assembler: "lsst.mypackage.extended_icSrc_assembler"srcCat.getWsc() componentDict['calib'] = disassembler: "lsst.mypackage.extended_icSrc_disassembler" } } |
The assembler and disassembler could like this & do monkey patching
srcCat.getCalib() |
By Pure-Composite Container Class
Another way would be to write a python type that contains source catalog, Wcs,
and Calib
and use the generic assembler.
Code Block |
---|
class SourceCatalogWithInfo:
def __init__(self |
Code Block |
def extended_icSrc_assembler(dataId, componentDict, classObj): # in this case, load the 'base' object, and then add in components as a monkey patch """no-op constructor""" pass srcCat = componentDict['icSrc'] srcCat.wcs# = componentDict['wcs'] srcCat.calib = componentDict['calib']set all the component data individually: return srcCat def extended_icSrc_disassembler(srcCat, dataIdsetSourceCatalog(self, componentDictsourceCatalog): componentDict['icSrc']self.sourceCatalog = srcCatsourceCatalog componentDict['wsc'] = srcCat.wscdef getSourceCatalog(self): componentDict['calib'] = srcCat.calib |
Or if the SourceCatalog class was extended to have setters & getters then the setters could be called by the assembler and the getters by the disassembler.
By Pure-Composite Container Class
Another way would be to write a python type that contains source catalog, Wcs,
and Calib
and use the generic assembler.
Code Block |
---|
class SourceCatalogWithInfo: def __init__(self return self.sourceCatalog def setWcs(wcs): """no-op constructor""" self.wcs = wcs def getWcs(wcs): passreturn self.wcs def setCalib(calib): # set all the componentself.calib data= individually:calib def setSourceCatalog(self, sourceCataloggetCalib(calib): return self.sourceCatalog = sourceCatalog def getSourceCatalog(selfcalib |
Assembler & Disassembler:
Code Block |
---|
def extended_icSrc_assembler(dataId, componentDict, classObj): # in this case, load return self.sourceCatalog def setWcs(wcs): self.wcs = wcsthe 'base' object, and then add in components as a monkey patch srcCatEx = classObj() srcCatEx.setSourceCatalog(componentDict['icSrc']) def getWcs(wcs):srcCatEx.setWcs(componentDict['wcs']) srcCatEx.setCalib(componentDict['calib']) return self.wcssrcCatEx def def setCalib(calibextended_icSrc_disassembler(srcCatEx, dataId, componentDict): componentDict['icSrc'] = self.calib = calib def getCalib(calib):srcCatEx.getSourceCatalog() componentDict['wsc'] = srcCatEx.getWsc() componentDict['calib'] = return self.calib srcCatEx.getCalib() |
Modify the policy to use the new type
...
Composable and Decomposable & Pluggable
- Use: The butler plugin for building the coadd needs to be written so that replaceable component objects are fetched individually. As newer versions of component objects are created they are 'put' into later/newer repositories. This way they they mask earlier versions of objects (and/or do they need to be findable by Processing Stage similar to how it's described in Access Exposure With Updated WCS?).
Pseudocode
Note, Jim says "We haven't yet implemented attaching background objects to Exposures yet, but it's been our to-do list for a long time. Currently our background class is pure-Python."
The background class is lsst.afw.math.BackgroundList
...
Code Block |
---|
def ExposureWithBackgroundAssembler(dataId, componentDict, classObj): exposure = componentDict['exposure'] exposure.setBackground(componentDict['background']) return exposure # do we actually need to pass in a componentDict; is it likely to pre-contain data that the disassembler will need? return exposure def ExposureWithBackgroundDisassembler(obj, dataId, componentDict): componentDict['exposure'] = obj.exposure componentDict['background'] = obj.background return componentDict |
Use:
Code Block |
---|
#Create a butler with the pre-detection data as inputs and will put the post-detection data for output import lsst.daf.persistence as dafPersist butler = dafPersist.Butler(input="datasets/A", output="datasets/B") dataId = {...} exp = butler.get("coadd_detection", dataId) exp.runDetectionProcessing() # this put will write both the background and the exposure to the repository at "datasets/B" butler.put("coadd_detection", dataId) # later... butler = dafPersist.Butler(input="datasets/B", outputs="datasets/C") exp = butler.get("coadd_postdetect", dataId) # do not modify the exposure object - it will not get written!) exp.performOperationsThatDoNotChangeTheExposure() # this will write the background to the repo at "datasets/C" but not the exposure. butler.put("coadd_postdetect", dataId) |
...
5. Composite Object Tables
...
Code Block |
---|
def srccat_bg_assembler(dataId, componentDict, classObj): # my understanding, per the example, is that there is no function to combine these datasets in the way # that the example wants to do it, so we'll just use a 'fake' function called 'doAssembly'. # TODO where does the output (composite) object originate? In some cases it may start as a component, if it has # to be loaded as a type 1 and then appended/overwritten with components. obj = classObj() obj = doAssembly(obj, componentDict['blue'], componentDict['green'] return obj |
...
Composable and Decomposable & Pluggable
- Use: for the dataset name specified for the aggregate case there will be a plugin that knows how to combine (or knows how to call the constructor in a way that will cause it to combine) the datasets into a single aggregate object.
Pseudocode
?? Question: is there a case where the aggregate composite should be persisted as a whole? Probably it would get written directly as a type 1 dataset? Or there would be a a disassembler (not shown) that knows how to break apart the aggregate deepCoadd and pass it back to be persisted as deepCoadd_src
...
Code Block |
---|
from lsst.daf.persistence import dafPersist butler = dafPersist.butler(inputs="my/input/dir") dataId {'filter':g} # note this leaves out tract and patch sourceCat = butler.get("deepCoadd_src_aggregate", dataId) # ...use the sourceCat |
...
7. Access to Precursor Datasets (via Persisted DataId)
...