This page needs to be updated to be consistent with LSE-319 and
Mario's slides define the Science Platform environment for human-driven analysis of LSST data products as being composed of:
Users can use the portal to do queries and visualizations, or they can connect through JupyterHub to a single-user notebook (later, perhaps a multi-user notebook) to do ad hoc analysis and visualization. These two modes of operation are built out of various components as listed below. In the long run, these are both The LSST Science Platform (LSP) is the environment provided for user access to and analysis of LSST and user-generated data products.
It presents three "Aspects" to the user:
- The Portal Aspect: a Web application that provides user-friendly, interactive query and visualization tools for catalog and image data;
- The Notebook Aspect: a JupyterLab-based, personalizable environment for interactive Python coding, with access to the LSST data products and additional user computing and storage resources; and
- The API Aspect: a set of Web-based APIs, primarily following IVOA standards, that provide access to the LSST and user-generated catalog and image data products.
These rely on an underlying infrastructure that provides:
- User computing, including resources supporting parallel, next-to-the data analysis;
- A "User Workspace" comprising both a file-oriented storage system and a user database system, each accessible from all three Aspects;
- Database services (including both Qserv and conventional RDBMS components);
- Pre-built releases of the LSST Science Pipelines software stack and other commonly-used Python software, such as Astropy; and
- Authentication and authorization and other support services.
Users can use the Portal Aspect to do queries and visualizations, and UI-driven interactive exploratory data analysis; or they can work in the Notebook Aspect to do Python-based ad hoc data retrieval, analysis, and visualization, using tools of the users choice. Whether from a notebook at the Data Access Center or remotely from an external personal or institutional system, they can also use the Web interfaces of the API Aspect for data access, whether programmatic or using external IVOA-compatible tools such as TOPCAT.
All three of the Aspects are, in effect, front-ends on the databases and files that ultimately will reside in the Data Backbone, and which are shared between all instancesacross Aspects. In the short run, the databases and files live in non-Data Backbone systems.Currently, I foresee several distinctly-operated instances of this platform
The system is designed to facilitate analysis workflows that cross from Aspect to Aspect. For instance, the results of a query performed in the UI-driven Portal Aspect can be accessed in the Notebook Aspect via a combination of a simple UI action and Python function invocation; data created from a user analysis in the Notebook Aspect can be visualized in the Portal Aspect, and so on.
We will simultaneously operate multiple distinct instances of this Science Platform. Each instance has its own list of authorized users, update cadence, and upgrade procedures.
Currently (2019) we operate two instances on hardware located at NCSA: a "stable" instance aimed at providing services both internally and to a set of "friendly" users, e.g., from the commissioning team and from Science Collaborations, who wish to familiarize themselves with the Science Platform environment as it continues to develop; and an "integration" instance that provides a platform for testing of new features and fixes for each Aspect in an environment in which the others are available, and that is periodically used for end-to-end tests of the entire Science Platform environment.
Moving into the Commissioning and Operations phases of the project, these instances will be augmented by additional ones:
- Chilean Data Access Center for science users for released data products.
- US Data Access Center for science users for released data products.
- Internal QA of L1 and L2 productions in the production environment. This instance has access to the published Data Backbone contents at the Archive but also has specialized access to unreleased intermediate data products and the internal, unreleased, incrementally-loaded Qserv instance for the next Data Release. In Operations, this instance primarily supports Science Ops. It can also be used by the Commissioning Team. It might have customized portal pages or other components not normally provided in the DAC instances above.
- Commissioning Cluster at the Base with low-latency access to the Data Backbone endpoint there. This instance primarily supports the Commissioning Team. Any customizations for the QA instance should be available here as well.
I would also expect there to be at least one instance in the integration environment for internal testing of updated software prior to deployment to one/all of the operational instancesThe integration instance, at least, will continue to exist indefinitely to support pre-rollout final testing of updates to any of the Aspects or other LSP components.
There may be operational models and requirement relaxations under which some or even all of these instances could be combined.
Initial deliveries of the platform use simple, less-functional components. Later upgrades will improve the components. The initial delivery of (a) a basic Portal integrated with prototype API Aspect services, and (b) a minimally functional notebook-mode QA instance is targeted for some time in Calendar 2017. The delivery of an initial fully integrated version of the other capabilities is targeted for November 2019 in order to precede the start of obtaining on-sky data with ComCam.
|The remainder of this page needs to be updated to be consistent with recent developments, LSE-319, and|
The initial version of this is the PDAC.
Initial components include:
- SUIT query/visualization portal using Firefly — data retrieved via DAX web services
- DAX web services
- Raw ADQL/SQL interface, output format translation
- Talks to Qserv
- Queries databases (e.g. ScienceCcdExposure table)
- Generates lists of Butler ids (dataset type plus dataId)
- imgserv - mosaic/cutout, regeneration, output format translation operations
- Files in GPFS with organization as prescribed in RFC-95, RFC-249
- Qserv database
The Data Backbone will manage the files, replacing the direct GPFS interface (GPFS will still be used underneath). It will perform inter-site replication and transparent (except for latency) retrieval of files from the tape archive. The Butler must be able to retrieve files from the Data Backbone. This can be a staged process (requesting files through a translation dbbToButler utility) and then using a Butler configured to talk to the local filesystem, but it will be more convenient and desirable to have the Butler talk directly to the Data Backbone.
The initial version of this is for Science Pipelines QA on processed HSC data and does not access SDSS or WISE data in the PDAC.