2017-12-08 Notes by KTL

Seamless Transitions

Goal: find/create data in one Aspect, view or analyze data in another Aspect (for all pairs of Aspects)

Queries and their results need to be transportable across Aspects

Query built in Portal goes to per-user query history (with query text); for some, query results are preserved for some limited query result sizes and expiration time
API Aspect is responsible for maintaining this
User may want to annotate the query; that would be a nice-to-have
Notebook can use TAP to search query history and then repeat TAP query by retrieving results by async id or by resubmitting ADQL text
Would be nice to have a query history widget in the notebook
Swagger UI allows you to choose language, HTTP library and inserts code; could be a model for replicating Portal queries in the Notebook
Using query id in history is not transportable to other users or environments
- Could store results externally somewhere (in notebook, in public-visible VOSpace, etc.)
- Query history widget could save ADQL text to improve transportability
Can cache query results by ADQL text, but there are worries about resource management and authorization
Could we have public and private queries? Possibly, but more complicated
Our User Workspace is not designed to support DOIs; possibly publishing/registering with the/a DBB might provide the appropriate level of permanence
Portal could save Python needed to retrieve data currently being viewed from within a notebook (e.g. in a VOSpace file)
ADQL query in the notebook can have results browsed in the Portal; TAP query returns an id if async; either browse query history in the Portal or paste the async id into a box; can use a Firefly API to send tabular data or "search" to the Portal
(Synchronous requests for shared scan queries will fail)
Would be nice if results can be preserved as a database table — yes, the user should always be able to CREATE (DISTRIBUTED) TABLE AS SELECT or do something similar (returning SQLite files could be equivalent)

Firefly data model is read/write accessible from a notebook, and the Portal can be driven from a notebook

User Workspace (File and DB) needs to be visible across Aspects

Nice to have:

Using Firefly visualizations and widgets in a notebook; SUIT would be packaging these, not SQuaRE

Blockers/Worries

What is resource management model for Notebooks?

Small birthright allocation ("small laptop"); cannot provide 16 GB of memory at this time
If allocated more resources, can use more storage or CPU in batch
Difficult for Notebook to allocate more than two containers to a given user (intended for regression testing) or to allocate more resources to a particular user

Frossie Economou to document this limitation in the design; will call this out when reviewed

Portal would like a VOSpace API to run against

Fritz Mueller can look into providing this (unauthenticated test service)

DataLab wrote their own Python API to VOSpace; do we know why?

Community is working on this, but not clear how much we need this (primarily for remote access from notebooks)
In general we're trying to avoid writing clients
SQuaRE could write clients if there isn't a suitable one

Results caching as above

Fritz Mueller to ensure that a strawman, speculative proposal is in the design document

Is a two-part MyDB adequate?

One part is in the Consolidated Database and is not joinable with Qserv but can be updated dynamically
One part is write-once replicated or distributed tables in Qserv that can be joined with other Qserv tables (including spatial joins with distributed tables) but not the Consolidated Database
Data could come from outside or could come from queries on existing tables

Fritz Mueller to document that there is a user-friendly loading tool for external data including providing sharding information and where TAP can be used to create tables
Gregory Dubois-Felsmann to ask scientists whether they require more than that such as CREATE TABLE FROM SELECT or moving from Consolidated to Qserv

HIPS server?

There may be a reason to release a coarse representation to the public

Gregory Dubois-Felsmann is writing an RFC for HIPS, for generating coverage maps in MOC format, and for serving these, RFC-441 has been created
Gregory Dubois-Felsmann will write a second RFC about public access if needed

Who will provide for regeneration of Python objects from database query results?

Butler has to do it from some kind of query
But is not required to do it using TAP

Makes sense to generate a HEALPix index for catalogs

Need to maybe have a qserv_areaspec_healpix
Will be part of the HEALPix/HIPS RFC

Are large numbers of simultaneous large transfers via WebDAV adequately efficient?

Likely OK, will be tested, but unclear if everyone realizes what is being provided
Design document will make clear what data download and upload mechanisms are provided from and to the User Workspace

Kian-Tat Lim to ensure that the current baseline for the Bulk Distribution Service (which does not talk to the User Workspace) is documented in LDM-148

How to handle high-volume Object-related tables from Data Release to Data Release (using match tables)?

Gregory Dubois-Felsmann to get someone to write up a use case for handling high-volume Object-related tables from Data Release to Data Release (using match tables)

Long-Range Planning

Take a day during DM All Hands in March to work on LSP

Integration and planning meeting around end of Sep18/beginning of Oct18 at NCSA

Short-Range Planning

Requirements and design document contributions from all teams (including Arch) by 2017-12-15

Space shortcuts

Page tree

Seamless Transitions

Blockers/Worries

Long-Range Planning

Short-Range Planning