Meeting began:  @ 11:01; ended 13:03 Project Time.

Attending

Next meeting:  @ 10am.

Requirements

We went through requirements, focusing on those that have open comments.

  • Replace "data backbone" with "data repository" in the "Data Ingest" requirement to avoid confusion.
  • We discussed the requirement relating to being able to process files in a directory that aren't in a special directory tree. The feeling was that some kind of mapping YAML file would be required for SuperTask because "pre-flight" would have returned just such a mapping file. This then triggered a discussion as to whether it is desirable for the Data Input System to be able to read a file that was written by the Data Output System without requiring additional mapping information (although it's hard to phrase this generically if the output file is a JPEG).
    • Michelle Gower noted that she currently uses ingestImages.py but Jim Bosch and Simon Krughoff felt that the simple mapping file is sufficient. Michelle Gower agreed if the file format for this was easy to edit. The requirement was tweaked accordingly.
  • Provenance was discussed again. We continued to debate how much provenance should go in the files themselves, how much should be written to a separate file as part of a composite dataset and for ingestion into the provenance database (overriding what the workflow system guessed). Tim Jenness felt that roots and parents should go in the files but that wasn't universally accepted and others felt that it's not a problem for every provenance look up be made to the provenance database using the dataset ID (presumably this only works if the provenance of this file has been stored in the data back bone; whereas storing parents in the file has more chance of working if those parents are in the data release and this was a derived product made outside the processing system).
    • It was stated that SuperTask should have a mechanism for dumping provenance information to a file. We should consider adding that to the SuperTask requirements.
    • Most people felt that there should be a requirement for a composite dataset to be persisted into a single entity for easy sharing and transfer. Jim Bosch felt that provenance should not be included in this and that the database should always be the source of truth.
  • Tim Jenness to update REQ4 (Metadata merging) to make it more explicit 
  • REQ11 (input and output repositories) was split into separate requirements for input and output. It was also requested by Michelle Gower that the word "plug in" be used if that was the intent.
  • Jim Bosch to rewrite REQ6 to make it clear that postage stamps and predefined subsets are different subsets 
  • Simon Krughoff to adjust REQ10 to make it clear whether multiple repositories means one butler knowing two repositories or two butlers. 
  • REQ111 was discussed. Can a database be a Data Repository? After some debate the answer was Yes.
  • REQ14 was adjusted to explicitly mention file paths rather than data locations.
  • Russell Owen reported that he had talked to Andy Salnikov about the L1DB system: DIAObject information will be written each time and Butler will likely be avoided.
  • REQ544 to be merged with REQ3: Dataset Types must be predictable by ops system at pre-flight and not depend on the pixel data itself: purely dependent on configuration and task code.
  • REQ599 and REQ5 were rewritten to use related language. Simon Krughoff needs to write a requirement for local caching for notebook users.
  • No labels