Meeting began  @ 11:00am; ended 12:45am Project.

Attending

Next meeting:  @ 10am.

Use Cases

Requirements

Many draft requirements were written since the last meeting. Jim Bosch worried that his requirements were more detailed then everyone else's. After some discussion we clarified that it is allowed to have parent/child requirement relationships where a child can provide concrete variations of a more general requirement. Each requirement then has one verifiable component. It was also noted that some of the detail currently in the Specification section could go into the Discussion section. Since the spreadsheet does not have IDs we are currently indicating parent/child by specifying a temporary ID and then using that ID with a "-a", "-b" suffix to indicate children. Tim Jenness will handle those in the MagicDraw migration. Unknown User (pschella) split a requirement as a proof of concept and we agreed the result was better.

There was some discussion over whether the pluggability requirements as written were sufficient. In at least one case it was felt that whilst the intent was to be to require a generic plug-in system for Data I/O, the resultant text was ambiguous. We agreed with Jim Bosch that there are two distinct requirements for the plug-in system: The ability to define a new Dataset Type and the associated low-level code for reading and writing them; and the ability to control what happens for each Dataset Type via a text configuration file that can be changed per-process, possibly resulting in the same Python object being materialized as a completely different entity on disk. Finally, Russell Owen felt we should be explicit in stating that the butler can support any Dataset Type associated with any Python object so long as the plugin code exists to understand that.

Jim Bosch was concerned that we did not have a good definition of a partial DataSet: a WCS being written as part of a composite dataset for a calexp but then being read back in as part of a raw dataset.

We had a very long discussion on concurrent access to Data Repositories.

  • What happens if a single repository is handling a get() from one process but a put() of the same Dataset from another process? We decided it was easier to leave this as "undefined behavior" as to whether the get() will get an old version, block on a new version, or return an error.
  • How do LSP batch jobs from a Notebook gather all their output products and how do they share an input repository?
  • Are DIAObjects an issue? Is the Butler involved in the DIAObject table read/writes during prompt processing?
  • Composite Datasets must complete writing of all parts of the Dataset for a put() to complete. If there is any problem writing any of the parts the entire put() should be rolled back.
  • We had a vigorous debate on full ACID compliance for put() and get(). There was also debate on whether a put() should always block and raise an exception on error, and whether it is ever okay for a put() to trigger a deferred write that might fail hours later even after the process has completed. Should that behavior be configurable? Do we ever need the deferred write?

Simon Krughoff clarified that multiple input repositories should be handled in two different ways. He has use cases where retrieving the first item that matches is what is required. He also has a use cases where a Dataset should be returned from each Data Repository.

Tim Jenness was worried about Data Discovery. If you have a coadd locally, how do you discover the DataIDs of the PVIs that went into the coadd? Do you query the Data Back Bone? Is there sufficient information in the coadd itself?

Michelle Gower noted that some of the requirements are being explicitly vague in the specification even though it is clear that some technologies should be involved (eg FITS, VO Services).

  • Tim Jenness to add a requirement for reading raw FITS files and for writing out FITS coadds (we have a requirement to serve FITS image products).  .


  • No labels