Meeting began:  @ 11:05; ended 14:58 Project time

Attending


Next meeting  @ 10am

Do not rearrange order of requirements on spreadsheet after evening of .

Previous Actions

We went through previous actions and closed some. A number of actions are still pending and it was agreed to close these before the next meeting.

  • Russell Owen felt that performance requirements were needed for alert issuing but we weren't sure what the allocated time budget was for these. Maybe Kian-Tat Lim would know.

Requirements

  • Brian Van Klaveren add a requirement for repository versioning and migration  
  • There was some discussion on whether we should support parallel writes to a shared output repository. It was felt that it was easier to provide a scheme for merging lots of small repositories.
  • A new requirement had been added by Unknown User (pschella) relating to supporting old APIs as much as possible. It was felt that we could not have this as a requirement but would consider it as a design driver. The requirement will be removed.
  • Brian Van Klaveren wondered whether we should promise to support the current repository format and the previous format, providing a migration path without requiring that everyone upgrades at once. Should we support in-place upgrading? That might be okay as an option if it meant that the metadata was reorganized without touching the pixel data.
  • "Data Repository Metadata lookup" requirement: There was some debate over this requirement and whether it is implicitly a feature of Data Discovery System already (although that is "by definition" from the glossary definition). New datasets must be able to define new metadata.
    • Jim Bosch clarify requirements in area of metadata lookup in data repository. Possibly adding a new one or clarifying "Data Repository Metadata lookup"  
  • The standalone requirement relating to use case DRP27 was removed and integrated into REQ18
  • "Provenance tracing of data quality" needs to be clarified to indicate that the provenance tracing is independent of the ability to query data quality. Quality tracking is a DBB requirement.
    • Tim Jenness update the provenance tracing requirement to remove data quality from specification  .
  • "Filter by non datasertref key": Very important for batch processing. Jim Bosch wondered if we could do this by creating a virtual repository and the removing items from it. Simon Krughoff wondered if we wanted to add a use case for similar functionality to filter out datasets for testing. Jim Bosch was against the idea of doing filtering in Python although he considered that a simple "except these datasets" might be an acceptable python filter.
    • Michelle Gower to split "filter by non datasetref key" requirement into two: one for virtual and one for late stage filtering 
  • Caching. We had a long discussion on caching where it was clear that we were thinking about two different types of caching. Tim Jenness was wanting a shared infrastructure to make it easy for a file that has just been downloaded to be stored for quick subsequent retrieval. Others thought that caching was a Data Repository function that would be part of the requirement for materializing a subset.
    • Tim Jenness to rewrite the caching requirement to be explicit that it concerns rerunning a program and expecting downloaded data to be available locally the second time (combine with "Dynamic caching to local disk") 
  • We had an extensive discussion on data repository subsets and whether they are connected to their parent or not. The consensus was that it's easier if they are completely distinct. We decided that in the cases where metadata has become stale (for example quality flags or seeing calculations have changed) the subset should be recreated. Do we want garbage collection fo subset data repositories?
  • "Execution on archived data": This needs further discussion as it's not clear that the data discovery system should be able to trigger retrieval from tape. Preflight will determine that some file is needed but something else has to request that file from the DBB if it is not on disk.
  • Michelle Gower add a requirement to say that it is possible to configure the Data Output System such that it is an error if an attempt is made to persist a dataset that is already present in the output repository. 
  • Tim Jenness do see if we are missing a requirement for the Supertask to be able to write a full provenance graph to a file for later harvesting 
  • "Aliases to Queries": This was discussed and Brian Van Klaveren suggested that the requirement should be rewritten such that it could be implemented as a database view.
    • Simon Krughoff clarify "Aliases to Queries" such that it could be implemented as a view or as a butler plugin.
  • Michelle Gower do we have a requirement to be able to cleanly delete a repository? Some repositories may include rows in database tables.
    • Jim Bosch decide if we need a "delete repository cleanly" requirement.  
  • Tim Jenness to clarify discussions of REQ164-b (reading WCS for raw data) 
  • There was some discussion about header propagation. It seems that in the current system only nominated headers are forwarded to output products. There is no attempt to combine headers from multiple observations (handling environmental headers and combining start/end headers correctly). Tim Jenness expressed some surprise that we expect people to use provenance look up to find standard metadata. The Starlink pipelines do merge this information consistently.
  • We examined "Local proxy" requirement and decided it should be dropped.
  • Michelle Gower read the 4 use cases associated with "Output staging" requirement to determine whether they are consistent with DBB design. 
  • We deleted the general FITS writing requirement on the basis that it is covered by the image and table requirements.


  • No labels