Seamless Transitions

Goal: find/create data in one Aspect, view or analyze data in another Aspect (for all pairs of Aspects)

Queries and their results need to be transportable across Aspects

  • Query built in Portal goes to per-user query history (with query text); for some, query results are preserved for some limited query result sizes and expiration time
  • API Aspect is responsible for maintaining this
  • User may want to annotate the query; that would be a nice-to-have
  • Notebook can use TAP to search query history and then repeat TAP query by retrieving results by async id or by resubmitting ADQL text
  • Would be nice to have a query history widget in the notebook
  • Swagger UI allows you to choose language, HTTP library and inserts code; could be a model for replicating Portal queries in the Notebook
  • Using query id in history is not transportable to other users or environments
    • Could store results externally somewhere (in notebook, in public-visible VOSpace, etc.)
    • Query history widget could save ADQL text to improve transportability
  • Can cache query results by ADQL text, but there are worries about resource management and authorization
  • Could we have public and private queries? Possibly, but more complicated
  • Our User Workspace is not designed to support DOIs; possibly publishing/registering with the/a DBB might provide the appropriate level of permanence
  • Portal could save Python needed to retrieve data currently being viewed from within a notebook (e.g. in a VOSpace file)
  • ADQL query in the notebook can have results browsed in the Portal; TAP query returns an id if async; either browse query history in the Portal or paste the async id into a box; can use a Firefly API to send tabular data or "search" to the Portal
  • (Synchronous requests for shared scan queries will fail)
  • Would be nice if results can be preserved as a database table — yes, the user should always be able to CREATE (DISTRIBUTED) TABLE AS SELECT or do something similar (returning SQLite files could be equivalent)

Firefly data model is read/write accessible from a notebook, and the Portal can be driven from a notebook

User Workspace (File and DB) needs to be visible across Aspects

Nice to have:

  • Using Firefly visualizations and widgets in a notebook; SUIT would be packaging these, not SQuaRE

Blockers/Worries

What is resource management model for Notebooks?

  • Small birthright allocation ("small laptop"); cannot provide 16 GB of memory at this time
  • If allocated more resources, can use more storage or CPU in batch
  • Difficult for Notebook to allocate more than two containers to a given user (intended for regression testing) or to allocate more resources to a particular user
  • Frossie Economou to document this limitation in the design; will call this out when reviewed

Portal would like a VOSpace API to run against

  • Fritz Mueller can look into providing this (unauthenticated test service)

DataLab wrote their own Python API to VOSpace; do we know why?

  • Community is working on this, but not clear how much we need this (primarily for remote access from notebooks)
  • In general we're trying to avoid writing clients
  • SQuaRE could write clients if there isn't a suitable one

Results caching as above

  • Fritz Mueller to ensure that a strawman, speculative proposal is in the design document

Is a two-part MyDB adequate?

  • One part is in the Consolidated Database and is not joinable with Qserv but can be updated dynamically
  • One part is write-once replicated or distributed tables in Qserv that can be joined with other Qserv tables (including spatial joins with distributed tables) but not the Consolidated Database
  • Data could come from outside or could come from queries on existing tables
  • Fritz Mueller to document that there is a user-friendly loading tool for external data including providing sharding information and where TAP can be used to create tables
  • Gregory Dubois-Felsmann to ask scientists whether they require more than that such as CREATE TABLE FROM SELECT or moving from Consolidated to Qserv

HIPS server?

  • There may be a reason to release a coarse representation to the public

Who will provide for regeneration of Python objects from database query results?

  • Butler has to do it from some kind of query
  • But is not required to do it using TAP

Makes sense to generate a HEALPix index for catalogs

  • Need to maybe have a qserv_areaspec_healpix
  • Will be part of the HEALPix/HIPS RFC

Are large numbers of simultaneous large transfers via WebDAV adequately efficient?

  • Likely OK, will be tested, but unclear if everyone realizes what is being provided
  • Design document will make clear what data download and upload mechanisms are provided from and to the User Workspace
  • Kian-Tat Lim to ensure that the current baseline for the Bulk Distribution Service (which does not talk to the User Workspace) is documented in LDM-148

How to handle high-volume Object-related tables from Data Release to Data Release (using match tables)?

  • Gregory Dubois-Felsmann to get someone to write up a use case for handling high-volume Object-related tables from Data Release to Data Release (using match tables)

Long-Range Planning

Take a day during DM All Hands in March to work on LSP

Integration and planning meeting around end of Sep18/beginning of Oct18 at NCSA

Short-Range Planning

Requirements and design document contributions from all teams (including Arch) by 2017-12-15