Meeting began:  @ 10am; ended 14:00 Project time.

Attending

Next meeting  @ 11am

Previous Actions

We went through previous actions and closed them all.

Requirements

The main purpose of this meeting was to sign off on as many requirements as possible, dealing with all outstanding comments.

  • "Local Proxy" had been marked for deletion in the previous meeting. Unknown User (pschella) made a case for keeping it and we agreed that it should be retained at a low priority.
  • Michelle Gower asked if all repositories must be readable/queryable by multiple users. Unknown User (pschella) replied that that is not necessary.
  • Simon Krughoff updated "Aliases to Queries" to use database views.
    • Simon Krughoff to consider whether we need a requirement for parameterizing queries of views. 
  • REQ3/544/Row11 ("Creation of new dataset types") were merged into REQ544.
  • REQ599/85 ("Subsetting a data repository with transfer") were merged. 
  • The title of REQ6 ("Parameterized subset of a dataset") was clarified by Russell Owen.
  • Simon Krughoff updated REQ66 ("Item from composite dataset").
  • REQ8a ("Dataset lookup: provenance driven"). We decided this is not required of all Data Discovery System implementations.
    • Jim Bosch to clarify REQ8a to indicate that there is a place to put provenance information such that it can be queried 
  • REQ9b was deleted as it was not clear whether it added anything that was not already covered.
  • REQ13: filtering of non-datasetref was updated and merged with related requirement.
  • REQ18-e and following requirement were unclear as to what "Development DBB" means. Following discussion where Michelle Gower noted that "Development DBB" is specifically for development of test pipelines using the batch system and not generic developer user space, it was decided that we should have a requirement for the DBB to support developer user space with provenance lookups for developer repositories.
    • Simon Krughoff to convert "Development DBB" requirements to explicitly be requirements on the LSST Data Facility and not butler requirements 
  • REQ422 was deleted since it is a duplicate of RE19.
  • REQ994 was accepted.
  • We discussed the Provenance Graph writing requirement and accepted it after some debate. In particular the discussion section was improved to mention "trickle up" provenance.
  • Michelle Gower wondered if we wanted to require that each put() would persist provenance to ensure that provenance is always available. After some debate we decided against this and felt that treating provenance as part of a composite dataset that can be put independently would be preferred, at least initially. Michelle Gower also wondered if we should have a specialist discussion on Provenance. We definitely should do that but we should use these requirements to seed the discussion with a wider audience.
  • We added a requirement for deleting a repository in a clean and safe manner and after some debate separated it into deleting and garbage collection to handle first class subsets.
  • Michelle Gower asked if we need an explicit requirement for supporting multiple repositories. Jim Bosch felt that this needs thought and that provenance in particular must be able to support this given that output repositories containing processed data will include provenance to input repositories from somewhere else. Brian Van Klaveren wondered if we should have a registry of repositories to simplify data discovery, and in particular DAX will have to know all the repositories that exist in the DAC. Tim Jenness wondered if Data Repositories should have their own URI prefix such as butler: or lsst: (similar to HTTP or ivo:).
  • Michelle Gower noted that we do not have explicit requirements to support authorization and authentication for repository access. We felt that this wasn't a problem given that DAC access or VOSpace would not work without it  and is clearly implied.
  • "Multiple reruns" requirements was discussed and ultimately we agreed to remove it as an implementation detail.
    • Simon Krughoff to ensure that use case SQR1.5a is attached to another requirement.  
    • Simon Krughoff to add a requirement for a user to have a local registry of all the repositories they have access to (possibly updated automatically as each repo is created) so that data discovery can find all their data products.  
  • REQ883: "Execution on archived data" requirement is really a DBB requirement.
    • Simon Krughoff to rewrite REQ883 along the lines of "it shall be possible to request that data from the DBB be retrieved asynchronously and made available in a user workspace".  
    • Jim Bosch to consider whether we need a requirement for DBB files retrieved in REQ883 to be ingestible into a Repository without support files  
  • REQ1999 was discussed as a performance requirement for AP. There was some uncertainty as to whether per-chip performance is the real number. There was also discussion as to whether L1 catch up should use direct database calls as are done for prompt processing or whether it would have to use the standard batch processing interface. Does this mean that inserts into L1DB have to be different in the two systems. It was suggested that we follow up with Kian-Tat Lim and Andy Salnikov for clarification since this might imply that prompt processing does not use butler APIs but catch up does and Russell Owen was concerned that this would complicate the implementation.


Jim Bosch and Unknown User (pschella) then presented a design for the butler based on these requirements. They are working on the detailed proposal in a tech note.


  • No labels