Science Pipelines Working Group Meeting 2016-03-24

LDM-151 Level 1

Andy's nightly processing diagram: LDM-151 figures

Single frame processing

Should we bring up cases where the order in a processing box doesn't match the plan? Not right now, diagram doesn't include specific iteration processes, but steps that are missing should be called out.
We should have definitions for the various internal terms we use like "ICExp", etc.
How should we distinguish products that are persisted for internal usage or sent to the community? Plan to add new symbols for community output products.
Are we waiting for both snaps to be available before starting this process? We run ISR on snaps individually, then combine snaps, so we can at least do that bit separately first. Not clear how much that gains, since ISR is cheap (except maybe BF).
Robert wants calls to the butler to be able to block while waiting for the second snap to arrive. Enables parallelization.
Brighter-Fatter is missing from the diagram, belongs in ISR.
Camera provides crosstalk, do they need any of our processing steps before they can apply it? Probably, but it's probably not a big deal if we need to do it ourselves.
Might also need to include corrections for pixel size variations here; less important for L1. Still a debate on how to handle this.
Do we know how much time each step takes? We have general estimates from the Bremerton meeting, but not in detail.
What about differencing the snaps against each other? Useful for CR rejection. Where do we do CR rejection? Right now we use morphology because we don't have snaps, but will use snaps if we can.
Can we ask the butler for the previous PSF? Conceptually, but would have to implement it into the butler. Discussion of communication with the OCS or the EFD. Right now pipelines will not have direct access to either the OCS or the EFD (correct?).
We can fit the PSF on the first snap in the 15 seconds while we're waiting for the second snap. But then what happens if we get rid of snaps.
Newly identified task: We may need to estimate the PSF as part of ISR to enable CR rejection.
We are currently not persisting the difference between snaps. We will want it, add.
We need to decide if we are going to alert on the difference between snaps. Answer is no, we never say we are going to alert on diff of snaps, not in the baseline.
CR rejection is significant, so we should call it out in the diagram clearly.
Robert: We could detect at two thresholds rather than detect deep, measure, then cut at bright and faint limits. Relevant to recent DM-4692 - Getting issue details... STATUS .
Everything in this box is currently per chip. This is probably good enough for background estimation, but might want to do larger scale WCS estimation. "Not photometry-grade background estimation" in this stage of L1. But we will want to do force photometry on these, so don't we need "photometry-grade"? Concern about background estimation in low latitude crowded fields.
We need to feed information back to the observatory, e.g. PSF. What timescales? Visit-after next? Who runs this? Run at the site, but it's our code, possibly in some degraded level. This degraded level might include single-CCD level processing rather than full focal plane. Requires synchronization.
Can we guarantee that we will be guiding on each exposure? If so, then we will know where we are pointing and might not need a global WCS solution to find where we are. If we aren't guiding, then we may have a poorer initial guess of where the exposure is on the sky.
Telescope wavefront sensing is not depending on us, and we are not relying on them for L1 but will use their results for DRP. We are not planning on using the wavefront sensing, but if we did, we would ensure their software was good enough.
We have calexps, calibrated bright stars, and PSFs at the end of SFM. Is this enough for at-the-observatory QA? Maybe, but might not have things like atmospheric transmission. Worried about crowded fields.
We must operate for two days if we loose the network link to the summit, so this (single frame processing and required QA) may need to be runnable on the summit. How is Telescope going to do this, will they use our (software) infrastructure? Runs on camera cluster. Don't have clear answers on how this will work. Issue for PST.
Current output in diagram shows Calexp, but doesn't include the photometric catalog. We are already generating this, add to diagram.

Difference Imaging

Template generation: comes from DRP. Will include whatever is necessary for DCR correction. That might be a datacube with wavelength as the third parameter. That might cause issue with the sizing model, depending on how many airmass/seeing bins are in the baseline. We have room for three copies per template per band, so need to check that this DCR plan will work.
Will have to bootstrap the first templates, presumably from commissioning.
We will have to do good warping of the templates to the science image. Can we precompute this in any way, maybe do this after the first snap? Remains open question of how we generate templates.
After detecting DIA sources, we will then measure force photometry on the difference between snaps for fpDiff. We generate this snap-diff image anyways for CR rejection. Need to show that this product gets pulled in from ISR.
Do we need a deblender on the difference images? Probably useful, but might be an acceptable loss if we didn't do it.
In general: Need to get feedback on what science questions that we need to develop answers to for both Sim and DM Science group tasking.
How do we inject false sources? Not clear if we do this in parallel or in the live L1 system. What diagnostics do we need to have live? Might do good enough with sky sources rather than fake, leave fakes to annual L1 reprocessing. Debate, Andy will ask the science experts on this. Cannot wait a year to provide exposure depths, but we think we can do this with sky sources. Do we want to let through some fraction of bogus objects?
- Andrew Connolly to ask science experts about the injection of false sources into the image data.

Association

Are we using online algorithms? Some cases might be useful, but if we can't use them in all cases, does it matter?
Why do we compute periodograms for every visit? Because, it's the default.
If we have DIA objects that are not detected in a visit, we don't release that as an alert. We will force photometer this, but not necessarily inside 60 seconds because it is not added to an alert.
We also need to decide how we provide real time mask information, e.g., would a source at this RA, Dec have been observable in this visit? Metaserve will provide this information
- Mario Juric will follow-up to ensure we can get pixel level observability information.
Can one ask the butler for a set of mask planes? Not right now, but in the future.
Assumption is that we will run another real-bogus filter here. Does this throw out DIA Objects, or does this throw out the linkage between a new DIA Source and DIA Object. Presumably the latter, but not all of the details are specified.
Can we rename NightMOPS to "Ephemeris Generation", since it's not even really MOPS? No disagreement stated.
Spuriousness is missing from the tables, add.
How do we "retire" a DIA Object. We say we will throw away anything that isn't picked up within the next month, because we don't want to keep junk forever. Need to decide criteria for this.
If we force photometer a DIA Object, does it become a DIA Source? Right now yes, we got rid of the ForcedDIASource table, but we reserved the right to split it out. Would save on sizing, since we wouldn't have to fit trailed models, etc.
Need to do forced photometry of externally-specified objects. Injected sources.

Alert Generation

Still need work to understand the details of this process. Not clear how we do alert publishing, or how much data throughput we want to actually use.
Are we going to archive our own events? Yes, and yes it will be queryable (with some issue of querying the live L1 data instead of the replicated copy). How much querying do we permit? Indices on sky? Is this a scope increase, or have we already signed up for providing a broadly queryable database?
Where does Alert Processing (UW's responsible portion) end? Once the alerts are sent and the appropriate DBs are updated.
Alerts have to go into DB before being sent, for consistency. Does the alert publishing feed from the VOevent database, or in parallel?
Andy's diagram assumes one "author" for all brokers, but Don's diagram says one author per broker since they may require different formats.
KT worried about authoring pulling from the VOevent database, and also latency and and load associated with external users affecting our processing.
Who designs the VOevent database?
Authoring and publishing questions need more detailed discussion offline.
Simon assuming User/Authentication services provided by SUI/T, but authoring provided by UW. Who is taking on the effort of making sure VOEvent supports our needs?
We have two sets of force photometry at the end of the diagram. Why two? First is fading objects, second is precovery?
Missing force photometry of DIA Sources on original direct image, Add.

General Level 1

DPDD: make available a precovery service, by API. We have pieces of this in the diagram, should be clear how we will do this, and who.
- Mario Juric and Kian-Tat Lim will follow up on finding the responsible team for precovery service.
Who maintains image/chip/visit metadata? Provided by metaserve.
Use case: "Find me a 3-sigma object in this (arcmin) area." Useful for asteroids, where you have an error ellipse. Requires going back to the images, since we are not detecting at this level. Interface is probably SUI/T, but who supplies the code for the image processing? Will the current image access service return an image of an arbitrary area? Yes. Then someone just needs to write the code to work on the cutout. Is this using direct or difference images? Useful to have difference images, but we might have to generate that off the latest template, not the original template.
Solar system processing. Not on this diagram, in: LDM-151 MOPS. On the MOPS diagram, asteroid assignment is a time-evolving problem. Do we have a requirement to retain this history? Not decided. Mario, K-T responsibility.
Zeljko: Simon and Andy, where do we stand? How hard will implementing this system be, and are we prepared to build it?
- Single Frame Processing, Simon: some worry about some of the pixel level corrections. They may be complex. But for all of these we have at least some implementation. Doable with the nominal 2 FTEs over two years.
- Alert Detection: Template generation is a hard problem. Requires both Princeton and UW effort. Jim: Two hard parts, varying seeing effects (Princeton is thinking about) , and DCR (UW). Image differencing and measurement is not a big concern. Might need to measure on likelihood images (offline discussion required). Robert thinks there's more to worry about in differencing.
- Hard to understand overall complexity of the operational system, how to coordinate all of the data flows in processing. Two general risks here: integration risk of all the associated pipelines, and individual scientific/algorithmic risks. UW/Princeton better set up to retire the latter, requires cross-site work to address the former. Also requires significant QA infrastructure for evaluating science products, requires prioritization decisions from Mario, Jacek.
- Association pipeline is hopefully not beyond state of the art, implement Budavari algorithm. Debate as to whether this belongs inside or outside the database. This association toolkit is also important for QA purpose. Want to preserve the ability to run this on e.g. a laptop.
- Where is ghosts/glints finding/masking in this diagram?
- Aggregate measurements bubble is not implemented. No milestone in the plan for variability characterization, need to add.
- Alert broker: going to require significant work. Community prototypes available for alert related work, but will want to improve. Lots of interfaces in this area to SUI/T, NCSA.
What is the impact of these L1 pipelines on the QA pipelines?

LDM-151 Level 2

Jim Bosch's pseudocode: drp-pseudocode.py

Flowchart drawn from the pseudocode: Level2 Flowchart.pdf

This flow chart includes lots of boxes with new algorithms that haven't been tested before. Is this an acceptable amount of risk? Particularly in using Kaiser coadds, would it be safer to build PSF-matched coadds first then build Kaiser coadds later in the survey if we have time? Response is that Kaiser coadds simplify the flow, since we won't need to build both deep coadds and best seeing coadds (they are the same in Kaiser). Also gains us ~0.2 mags depth. Could be gained later in the survey, but PSF-matching will still require complexity.
Questions about what timeframe we have to have for templates.
Came back to the question of do we need to provide force photometry on single direct images, given that we can't deblend them, and that we think the same information is recoverable by forced photometry in the difference images.
Open questions from Jim: how do we generate masks? Where do fake sources appear?
Where is the atmospheric transmission function for each source? Question came up of what SED is assumed for photometry. Robert says he will make a best guess of an object SED, then use that for doing the photometry. Will provide mechanism for the users to undo this. This might be more complicated to explain to the user (that we used an assumed SED) than a flat SED, but the argument is that most people won't care. Some debate of how to parameterize the SEDs.

Zeljko Ivezic will ask David Kirkby about the statement "50% of LSST gold sample galaxies will be blended at some WL-significant level".

There is concern that our algorithms are complicated in order to support stringent requirements for WL, but we don't want to sacrifice the performance of the rest of the survey due to difficulties encountered in these algorithms. But it is difficult for Jim and Robert to say to the WL community that they won't need to do pixel processing on their own if we are not committing to supporting the state of the art algorithms in DM. Some general debate on the level of risk associated with these algorithms.
K-T: do we have a definition of/requirements on the problem of algorithm-middleware interaction to support the multiple scales of processing we need (?). Need a collaborative effort between science pipelines and NCSA/SLAC. Jim can try to enumerate potential methods for dividing work/multiprocessing, so that we can iterate with someone versed in the middleware to help judge what is feasible or not.
Zeljko: community asking to get their hands on the details of the algorithms that science users will interact with. Jim: Tension between developing the end to end system vs developing individual algorithms to completion. We have tended towards the end to end system instead. Don't want to spend time on low-risk algorithms right now. Lots of prioritization questions here. Is JDS maintaining a risk register!?

Status of Level 2

Bootstrap is basically done. Requires some attention to individual pieces, and may require some more middleware.
This diagram doesn't have the right level of resolution to show who (Princeton or UW) owns which specific pieces. Worry that the boxes in this diagram in general too big to be estimable, so we can't use it to estimate if we are adequately staffed just based on this.
Jointcal is underway. Needs photometric and chromatic improvements, also handling of "frozen-in" effects like tree rings.
Image characterization - Don't have PSF modeling. Also depends on chromatic jointcal.
Background match and coadd with rejection - currently uncertain. Needs middleware for communication between processes, since it's dependent on many visits. Discuss on Friday.
Warp and Correlate - New code, but we probably have the underlying primitives.
Decorrelate and Match - Big question here, since it is Kaiser coadds. Hard box, "two years of a good person".
Diffim - level 1 could be sufficient, or we might want to spend effort here if we have effort to spare.
Detection - Good as is, but we might want to experiment with cross band detection. Further work might depend on the associate part of "associate and deblend".
Resolve overlaps - tricky, nothing deep and algorithmic about this, but needs to be careful. "Easy to do a bad job on it". Issue is a family of deblended sources on one side that overlap a family on the other tract but with different sources.
Multifit is hard, and is a significant performance question. Can we do this with just maximum likelihoods instead of full posteriors. For Robert's science, yes. For weak lensing, no. NGMix (full posterior) is currently doing better than the m3shape (ML) code for weak lensing. If performance becomes a problem, descope is reduce the number of samples, simplify PSF model. Is this something that DESC should tell us? Probably not.
Want a DPDD-level statement about masks/mapping.

Space shortcuts

Page tree

LDM-151 Level 1

Single frame processing

Difference Imaging

Association

Alert Generation

General Level 1

LDM-151 Level 2

Status of Level 2