Agenda:

  • Discussion of use cases
  • Can people meet at 2PM on Thursday next week: 24 May instead of the normal Wed meeting?
  • Nomenclature


Use case discussion:

This actually ended up focusing a lot on terminology as well as provenance and persistence.

A decision was taken that certain member of the group interview people with direct experience drilling into processing problems:

These interviews should be short in duration (no more than an hour) and should be at a level where they can easily be transcribed into a usage narrative like the ones we've been producing for the other contexts.

Following is a rough transcript that followed about how we should think about metrics and data that go into metrics.

Eric: It's very important to think about a smooth transition from SQuaSH to a datastore to drill down capability.

Tim: There are really two kinds of data for QA:

  • Data products in the repository
  • Metric values outside the repo in a QA DB

It needs to be trivial to get from a metric value to the data that went into the computation of that metric value.

Tim and Lauren: Our experience is that storing the data that go into measuring a metric value is important.  I.e. even if the computation is trivial (a color) it reduces overhead when re-measuring the metric value.  Metric values can be persisted to a per repository DB and/or a centralized QA DB.

Tim: Note that there is a tradeoff between time spent in preprocessing and time spent in interactive exploration.  Any data that goes directly into an aggregated metric should be persisted to make sure we end up with the same data when we drill down later.

Lauren: Note that there are multiple regimes when it comes to metrics: e.g. KPMs vs. user defined metrics.

Everyone: It's very clear tha keeping track of what data went into a metric value is both really important and not obviously trivial to solve.

Eric: If metric values remembered the data ids that go into them, it gives great flexibility for inspecting at increasingly finer granularity (aside by Simon, this seems like it is saying we need to solve the generic provenance problem and make sure it integrates with our verification system).

Can people meet at 2PM on Thursday, 24 May?

John Swinbank the response was a unanimous "yes", though Eric pointed out there is a pipelines meeting at that time that you both typically attend.  He indicated it wouldn't be a problem to miss this one time.

Nomenclature

There was some discussion.

  • Simon Krughoff will take on updating the existing glossary with discussion in the meeting.