1.1. Date

1.2. Attendees

1.3. Agenda

  • Architectural discussion of PPDB – where does it fit and what does it need to do?
  • Possible implementation strategies
  • When is what needed?
  • Adjacent: AP/APDB testing

1.4. Notes

  • Arch overview from KT
    • Building migration APDB→PPDB necessary since APDB and PPDB are unlikely to be same DB infrastructure (so can't use off-the-shelf DB replication)
      • Should we: do it directly from online APDB queries?  Or..
      • Drive from files containing updates, written as APDB is updated?
    • Eric: PPDB only required to be updated in <24hrs?  If we want SS proc. in daytime to have most recent data, then PPDB has to be updated in timely fashion...
    • Fritz: How long does MPC need for their update?  Ian, Eric: recall on order 3hrs? (check)
    • Andy S. on pros/cons:
      • Consistency: w/ files, uncertain how consistent they are, and do you have everything you need?  You need reliable storage or your own db to track...  Would be more robust to make APDB single source of truth and query...
      • Another idea to explore: make PPDB same as APDB (Cassandra PPDB?  How would TAP work?  Is this feasible, or too complex?  Would let us leverage Cassandra replication.)
      • Replication queries against APDB may need additional indexing or schema considerations to optimize (so far APDB optimized only for AP)
    • Spencer: explore export via dedicated Cassandra read replica?
      • Andy S. may be expensive – we could test...
    • Andy: AP could write the files itself (could be integrated with APDB API)
    • Ian: SS will only be sending small slice of PPDB to MPC, and data coming back is a small subset also.  Is this worth splitting out separately?
      • Andy: currently APDB API does not have SS objects, but will certainly need it
      • Eric: we will use and update SS objects during night, makes sense to include in APDB to keep complexity low; KT concurs
      • SS object updates need to flow to PPDB as well.  Again, would be most robust to flow this through APDB (But would that take too long?)
      • PPDB updates need idempotency; interactions with externals (e.g. MPC) may fail/retry
  • Fritz: So who owns building what?
    • Eric: Interactions with MPC seem like AP should be closely involved, other stuff DAX?
    • Colin: APDB to PPDB replication seems a natural fit for DAX.  All agree.
  • Ian: clarification – what is in "AP files" on diagram?
    • KT: everything needed for PPDB update so APDB would not need to be queried
    • Conceived as something like parquet files flowing out of AP
    • Ian: how long do they hang around?  KT: short lived, just long enough to be ingested
  • KT: driver for all this design is interfaces (PPDB end user, and MPC in/out?)  Once we understand those boundaries we can design more
  • Fritz: when is what needed to avoid blocking?
    • Eric: MPC is already prototyping the data interchange.  Recommend checking in with Mario and find out how this is going?
    • Eric: probably need most of this in ~6-9 mos. to support commissioning
    • Colin: we are carrying design risk by not proving out PPDB replication – discharging this sooner than later would be best! 
  • Ian: where are we standing this up?
    • Eric: recent conversations with Wil imply prototype in IDF, move to SLAC as soon as possible
    • PPDB at IDF seems like the right thing
    • KT: IDF should just used canned inputs.  comcam driven AP should be done at SLAC
  • Look to DPDD for PPDB schema– this is what AP has been testing with/to
    • Andy: APDB testing was to this source as well.
    • Updates to DPDD needed?  (Missing columns in DiaSource and DiaObject?)
      • Eric: probably due to time-series and flags...

1.5. Action items