Need a new term to refer to user-created tables that are do not bring the full spectrum of "Level 3" requirements; i.e., user uploads a random catalog, what do we call that?
Context: Gregory suggested that a user could make a "level 3" data product on their home computer, but Frossie suggests this shouldn't be conflated with products stored on the DAC
Need to clearly specify that we support a MyDB-type storage of arbitrary data, does not have to be related-in-scientific-meaning to the LSST data products
Determine how tables will be uploaded to MyDB, either by Portal or Notebook or external clients.
There was a discussion of "direct access to consolidated database", beyond capabilities of IVOA protocols. Need to follow up on whether this is mediated by DAX services or if some form of more direct connection to consolidated DB is acceptable.
Science platform requirements say MyDB reqs are "still in development"
Scope effort for providing user-facing crossmatches/comparisons with other surveys, beyond those that are already in-use for Pipelines validation.
What are the sizes of user-generated data in existing archives? Our 10% for users is probably too small.
Leanne Guy will ask Gaia archive for user data volumes.
What is the story that we will tell reviewers for how external groups will leverage more compute/storage, of their own acquisition, beyond the 10%? What development requirements does such a system impose on us, and what decisions do we need to make to support this "beyond 10%" capability? Do we have appropriate requirements in the baseline, and to support evolving this N-years into operations?
Leanne Guy will write a tech note listing some "Level 3"/"MyDB" science use cases and present first version at SST/DMLT F2F Nov 2018
Arch and Blockers
Raise at DMLT the question of whether Google is a required deployment target.
Raise at DMLT whether the parquet support should be added to the portal.
Having parquet replication instead of replication inside of qserv has an impact on operational recovery; are we Are we willing to accept this?
Implicit in the parquet+dask/spark scenario is that the data would be stored separately from the compute resources. Are the network resources sufficient to support this?