Meeting began:  @ 10:01; ended 11:58 Project Time

Attending

Next meeting:  @ 11am.

Use Cases

We agreed that the use of an "OPS" prefix for Data Facility use cases was confusing given that many of the other use case relate to operations. It was decided they would be renamed to use an "LDF" prefix. The use cases from Dominique Boutigny will be relabelled as "LDF" use cases but starting with number 101. Tim Jenness would note in the final Use Case document that LDF101 and above came from a different team.

Requirements

For this meeting we went through the first 50 requirements and dealt with comments that had been made on them.

  • REQ4 was clarified to explicitly state file and database information to be combined. Detail was left out as this was driving an architectural design and other requirements are explicit about FITS and database access.
  • REQ8 "DataUnit Lookup" was discussed and deemed confusing. Is it talking about provenance lookup from an existing coadd or data discovery with the intent of making a coadd?
  • Simon Krughoff to consider splitting REQ8 into two discrete units  
  • REQ9 "Multiple Chained Input Repositories". Should we specify in requirements how lookup ordering will work? Is this a Data Discovery System requirement or Data Input System requirement? We decided it was Discovery system and adjusted the requirement to be explicit. There was some confusion as to how REQ9 and REQ10 related to each other. REQ9 is chaining repositories containing different datasets, whereas REQ10 is combining repositories that have the same datasets processed in different ways (for example, DR1 and DR2).
    • Michelle Gower: When we write provenance information to the output repository, do we need to include the input data repository for the chained case?
    • Brian Van Klaveren: Should the Data Discovery system be able to return all matches from all available repositories, allowing the user to decide which subset to use?
    • Russell Owen: Can we consider the ability to restrict chaining such that finding any match to a DatasetExpression would stop searches of additional repositories?
    • Simon Krughoff to add a new Data Discovery System requirement to explicitly require searching of remote repositories 
  • During our discussion, Michelle Gower described a case where operators would want to black list certain observations as part of exploratory work. This would be in addition to the global data quality flag. We realized we had no requirement to allow local overrides of certain queries.
    • Tim Jenness to add local configuration-based blacklisting requirement  
  • REQ11 "Remote Output Data Repositories": Should we declare core functionality for all Output Repositories? Not all output repositories will be as flexible.
  • REQ12 "I/O of Arbitrary Objects": A possible duplicate of REQ1. We need to understand how "arbitrary" we mean and whether this is a more specific version of REQ1.
  • REQ13 "Data Discovery" is a generic Data Discovery System requirement. Can anything in the DBB be queries? If you take a subset of a data repository and metadata in the original is updated (eg seeing calculations or bad observation flag) should the subset be able to update it's copy? Can you query arbitrary EFD channels and if you can, when you subset how much of the EFD should be copied over? Should the data discovery system always do a remote query if it had a local subset?
    • Simon Krughoff to consider addition of a new requirement dealing with metadata syncing to subsets 
    • Simon Krughoff to consider whether subsets should always query parent data repository 
  • REQ17-d talks about the Data Input System being able to read the outputs from the Notebook Batch System. The Notebook Batch System is somewhat underspecified at this time although Brian Van Klaveren suggested we look at the IVOA UWS standard. After some discussion it was clear that if each batch job is writing a local output repository, the process that harvests the output files should not only be placing them back in the user's shared VOSpace, but should also be clever enough to merge all the output data into a single output repository. Without this the notebook user will not easily know which data repositories were created as they don't know how many jobs were run.
    • Tim Jenness try to write a harvesting requirement for the Notebook Batch System 
    • Unknown User (pschella) write an explicit requirement that notebook users shall be able to access repositories from their shared workspace. 
  • We started to discuss REQ20 but felt that it would be better to defer clarification until Jim Bosch is available.

Plan for next meeting is to go through second half of requirements. Once open actions from this meeting are dealt with we do not feel that there will be major additions to the requirements.



  • No labels