ButlerWG Meeting 2017-09-14

Meeting began: 14 Sep 2017 @ 10:05am; ended 11:35am Project Time.

Attending

Next meeting: 11am 19 Sep 2017

Requirements

We went through the comments that people have made on the draft requirements.

Jim Bosch to refactor Consistent Discovery Interface requirement into parent/child 19 Sep 2017
Jim Bosch simplify Data Repository Layering wording. 19 Sep 2017

There was further discussion on provenance as it relates to data discovery. Simon Krughoff worried about how provenance from notebook batch jobs will be tracked if the products are shared among commissioning team members? Tim Jenness felt that at the very least the files themselves should contain parent information. The Starlink provenance system also included references to raw data. It should not require a lookup to a metadata service to learn which files went into a coadd. We then discussed how to handle intermediates from a notebook processing if those intermediates are never persisted to the data back bone.

Jim Bosch to reword Sky Tile Definition requirement 19 Sep 2017

Jim Bosch suggested that we add a concept of "Repository Management System".
We discussed caching strategies and there are multiple requirements now listed that deal with caching. Unknown User (pschella) wondered if they could be combined.
- Simon Krughoff mentioned a new Use Case that had been missed: if a notebook is attached to a remote data repository, the first time the notebook runs you expect it to be slow, but when you run it again you expect it to use the cache.
- Simon Krughoff to write a use case for notebook caching 19 Sep 2017
- Tim Jenness / Simon Krughoff to write a requirement for notebook caching of remote data repository 19 Sep 2017
- Brian Van Klaveren agreed that caching was an important concept for remote data repositories.
Michelle Gower asked whether it makes sense to be able to subset a Data Repository without transferring it. The reply was that a Virtual Data Repository would be a useful concept to allow multiple processing jobs to share the same subset.
Russell Owen started a discussion on L1 Prompt Processing interaction with butler and publishing of alerts. Simon Krughoff assumes that a Butler put() would be used to send the alert packets to Kafka. This would have the advantage of allowing the alert publication process to be easy to configure. Brian Van Klaveren was worried that this approach would limit the ability of the system to handle recoverable error conditions. Catch up processing was mentioned and we were uncertain how catch up processing would interact with Kafka given that it's running within the standard batch processing system and will not ordinarily have access to databases during execution (so might need to write all data to files). This is important if the assumption was that the L1DB would be populated via Kafka. Russell Owen was concerned by the timelines required for the L1DB to be updated with new DIAObjects.

Russell Owen to talk to Kian-Tat Lim about interaction of prompt processing and catch up processing system with L1DB 19 Sep 2017

Time lines for WG

Jim Bosch and Unknown User (pschella) reported that they have been considering the design issues for butler in parallel with requirements gathering. They hope to be able to present a strawman in a week to 10 days. Brian Van Klaveren described a layered model for the butler design and was asked to provide a diagram.

Brian Van Klaveren to create a diagram of his proposed layered model design. 19 Sep 2017

We agreed that first draft of requirements will be completed and ready for internal review by 19 Sep 2017.

Space shortcuts

Page tree

Attending

Requirements

Time lines for WG