Seamless Transitions
Goal: find/create data in one Aspect, view or analyze data in another Aspect (for all pairs of Aspects)
Queries and their results need to be transportable across Aspects
- Query built in Portal goes to per-user query history (with query text); for some, query results are preserved for some limited query result sizes and expiration time
- API Aspect is responsible for maintaining this
- User may want to annotate the query; that would be a nice-to-have
- Notebook can use TAP to search query history and then repeat TAP query by retrieving results by async id or by resubmitting ADQL text
- Would be nice to have a query history widget in the notebook
- Swagger UI allows you to choose language, HTTP library and inserts code; could be a model for replicating Portal queries in the Notebook
- Using query id in history is not transportable to other users or environments
- Could store results externally somewhere (in notebook, in public-visible VOSpace, etc.)
- Query history widget could save ADQL text to improve transportability
- Can cache query results by ADQL text, but there are worries about resource management and authorization
- Could we have public and private queries? Possibly, but more complicated
- Our User Workspace is not designed to support DOIs; possibly publishing/registering with the/a DBB might provide the appropriate level of permanence
- Portal could save Python needed to retrieve data currently being viewed from within a notebook (e.g. in a VOSpace file)
- ADQL query in the notebook can have results browsed in the Portal; TAP query returns an id if async; either browse query history in the Portal or paste the async id into a box; can use a Firefly API to send tabular data or "search" to the Portal
- (Synchronous requests for shared scan queries will fail)
- Would be nice if results can be preserved as a database table — yes, the user should always be able to CREATE (DISTRIBUTED) TABLE AS SELECT or do something similar (returning SQLite files could be equivalent)
Firefly data model is read/write accessible from a notebook, and the Portal can be driven from a notebook
User Workspace (File and DB) needs to be visible across Aspects
Nice to have:
- Using Firefly visualizations and widgets in a notebook; SUIT would be packaging these, not SQuaRE
Blockers/Worries
What is resource management model for Notebooks?
- Small birthright allocation ("small laptop"); cannot provide 16 GB of memory at this time
- If allocated more resources, can use more storage or CPU in batch
- Difficult for Notebook to allocate more than two containers to a given user (intended for regression testing) or to allocate more resources to a particular user
- Frossie Economou to document this limitation in the design; will call this out when reviewed
Portal would like a VOSpace API to run against
- Fritz Mueller can look into providing this (unauthenticated test service)
DataLab wrote their own Python API to VOSpace; do we know why?
- Community is working on this, but not clear how much we need this (primarily for remote access from notebooks)
- In general we're trying to avoid writing clients
- SQuaRE could write clients if there isn't a suitable one
Results caching as above
- Fritz Mueller to ensure that a strawman, speculative proposal is in the design document
Is a two-part MyDB adequate?
- One part is in the Consolidated Database and is not joinable with Qserv but can be updated dynamically
- One part is write-once replicated or distributed tables in Qserv that can be joined with other Qserv tables (including spatial joins with distributed tables) but not the Consolidated Database
- Data could come from outside or could come from queries on existing tables
- Fritz Mueller to document that there is a user-friendly loading tool for external data including providing sharding information and where TAP can be used to create tables
- Gregory Dubois-Felsmann to ask scientists whether they require more than that such as CREATE TABLE FROM SELECT or moving from Consolidated to Qserv
HIPS server?
- There may be a reason to release a coarse representation to the public
- Gregory Dubois-Felsmann is writing an RFC for HIPS, for generating coverage maps in MOC format, and for serving these, RFC-441 has been created
- Gregory Dubois-Felsmann will write a second RFC about public access if needed
Who will provide for regeneration of Python objects from database query results?
- Butler has to do it from some kind of query
- But is not required to do it using TAP
Makes sense to generate a HEALPix index for catalogs
- Need to maybe have a
qserv_areaspec_healpix
- Will be part of the HEALPix/HIPS RFC
Are large numbers of simultaneous large transfers via WebDAV adequately efficient?
- Likely OK, will be tested, but unclear if everyone realizes what is being provided
- Design document will make clear what data download and upload mechanisms are provided from and to the User Workspace
- Kian-Tat Lim to ensure that the current baseline for the Bulk Distribution Service (which does not talk to the User Workspace) is documented in LDM-148
How to handle high-volume Object-related tables from Data Release to Data Release (using match tables)?
- Gregory Dubois-Felsmann to get someone to write up a use case for handling high-volume Object-related tables from Data Release to Data Release (using match tables)
Long-Range Planning
Take a day during DM All Hands in March to work on LSP
Integration and planning meeting around end of Sep18/beginning of Oct18 at NCSA
Short-Range Planning
Requirements and design document contributions from all teams (including Arch) by 2017-12-15