Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There was some discussion on whether we should support parallel writes to a shared output repository. It was felt that it was easier to provide a scheme for merging lots of small repositories.
  • A new requirement had been added by Unknown User (pschella) relating to supporting old APIs as much as possible. It was felt that we could not have this as a requirement but would consider it as a design driver. The requirement will be removed.
  • Brian Van Klaveren wondered whether we should promise to support the current repository format and the previous format, providing a migration path without requiring that everyone upgrades at once. Should we support in-place upgrading? That might be okay as an option if it meant that the metadata was reorganized without touching the pixel data.
  • "Data Repository Metadata lookup" requirement: There was some debate over this requirement and whether it is implicitly a feature of Data Discovery System already (although that is "by definition" from the glossary definition). New datasets must be able to define new metadata.
    •  Jim Bosch clarify requirements in area of metadata lookup in data repository. Possibly adding a new one or clarifying "Data Repository Metadata lookup"  
  • The standalone requirement relating to use case DRP27 was removed and integrated into REQ18
  • "Provenance tracing of data quality" needs to be clarified to indicate that the provenance tracing is independent of the ability to query data quality. Quality tracking is a DBB requirement.
    •  Tim Jenness update the provenance tracing requirement to remove data quality from specification  .
  • "Filter by non datasertref key": Very important for batch processing. Jim Bosch wondered if we could do this by creating a virtual repository and the removing items from it. Simon Krughoff wondered if we wanted to add a use case for similar functionality to filter out datasets for testing. Jim Bosch was against the idea of doing filtering in Python although he considered that a simple "except these datasets" might be an acceptable python filter.
    •  Michelle Gower to split "filter by non datasetref key" requirement into two: one for virtual and one for late stage filtering 
  • Caching. We had a long discussion on caching where it was clear that we were thinking about two different types of caching. Tim Jenness was wanting a shared infrastructure to make it easy for a file that has just been downloaded to be stored for quick subsequent retrieval. Others thought that caching was a Data Repository function that would be part of the requirement for materializing a subset.
    •  Tim Jenness to rewrite the caching requirement to be explicit that it concerns rerunning a program and expecting downloaded data to be available locally the second time (combine with "Dynamic caching to local disk") 
  • We had an extensive discussion on data repository subsets and whether they are connected to their parent or not. The consensus was that it's easier if they are completely distinct. We decided that in the cases where metadata has become stale (for example quality flags or seeing calculations have changed) the subset should be recreated. Do we want garbage collection fo subset data repositories?
  • "Execution on archived data": This needs further discussion as it's not clear that the data discovery system should be able to trigger retrieval from tape. Preflight will determine that some file is needed but something else has to request that file from the DBB if it is not on disk.
  •  Michelle Gower add a requirement to say that it is possible to configure the Data Output System such that it is an error if an attempt is made to persist a dataset that is already present in the output repository. 
  •  Tim Jenness do see if we are missing a requirement for the Supertask to be able to write a full provenance graph to a file for later harvesting 

...