Overview

PDAC version 1 is primarily a demonstration of our ability to combine elements of the Infrastructure, Data Access & Database, and SUIT components of DM into an end-to-end system, about one-third of the way through LSST construction.  The system as delivered should not be considered as a demonstration of what we expect the final system to look like from a user's perspective.  The main development effort in PDACv1 went into building the connections between the layers and ringing out a variety of issues exposed by this end-to-end testing.

PDAC version 1 serves a dataset that resulted from a Summer 2013 DM-stack reprocessing of the SDSS Stripe 82 data.  This reprocessing performed coadds in all five SDSS filter bands, generating a table of measurements-on-coadds (most closely related to the ultimate LSST "Object" table, but not based on any Multifit-like analysis), and then the i-band detections were used as seeds for forced photometry on every epoch in all five bands, producing output most closely related to the planned ForcedSource table.  There are no associations recorded between the Objects in the five bands - separate rows in the table result from the measurements in each band.  (This is a key difference from the ultimately planned LSST data model.)  Additional information is available on the Properties of the 2013 SDSS Stripe 82 reprocessing page.

The dataset contains both calibrated single-epoch images and single-band coadded images.  The raw images are not being served through PDACv1.

We believe that this is a scientifically useful dataset and that we will obtain useful experience from an attempt to use it for scientific purposes.  We are therefore interested in, within reason, supporting a very small number of test users (see the access policy in LDM-482) and in addressing issues that arise in attempting to use the system and the data.

The image data are stored as FITS files in the usual afw output format.  The image metadata catalogs are in conventional MariaDB databases, and the Object-like and ForcedSource tables are in a Qserv spatially-partitioned database.  The partitioning should allow near-neighbor (spatial join / correlation) queries out to a 1 arcmin radius.  This has not been extensively tested on the PDACv1 dataset (it has been tested in the Database group's development work that led up to PDAC, however).  NB: the radius supported is as documented in LDM-135, section 3.3.6.

The image and catalog data are accessible through a preliminary version of the webserv REST API (see the API page on Confluence for some information, though it does not precisely define the as-delivered behavior of the services running in PDACv1).  If there is interest in directly querying the database through the dbserv part of this API, we can provide limited support for a small number of users.  

For those tables that are hosted in Qserv, the Qserv manual may be a useful reference for those constructing direct queries.

The user interface currently (as of early February) provides positional query facilities for each of the following tables:

  • Single-band coadded image metadata (DeepCoadd table)
  • Calibrated single-epoch image metadata (Science_Ccd_Exposure table)
  • Measurements on coadds (RunDeepSource table)
  • Forced photometry based on i-band coadds (RunDeepForcedSource table)

Image and catalog query results will be displayed using Firefly's native toolkit.

A very brief list of basic capabilities of the portal is on the PDAC access page, which we expect to integrate with the present page soon.  Note that IRSA's public data holdings are available through PDACv1.  in particular, the first Gaia public data release's catalog is available and may be of interest to PDAC users.

Accessing PDACv1

Access to PDACv1 is limited, as discussed in LDM-482.  Please do not attempt to use the system without permission; it is essential that we know who is trying to use the system in case we have to contact users in case of problems.

Because the web services and other exposed interfaces of the system have not yet been subjected to formal security review, they are currently in a "walled garden" network environment behind a VPN.  Access requires VPN credentials from NCSA.  The VPN address is vpn.ncsa.illinois.edu.

With the VPN connected, PDACv1's portal-style user interface can be reached at http://lsst-sui-proxy01.ncsa.illinois.edu/suit .

The DAX REST services mentioned above are available at http://lsst-qserv-dax01.ncsa.illinois.edu:5000/.  We strongly recommend that you ask for advice before attempting to use the REST services directly.  In particular, please do not perform unlimited "SELECT * FROM RunDeep[Forced]Source" queries (i.e., without restrictive WHERE clauses) on dbserv at this time.  The system is not scaled for this.

Communicating with the PDAC team

We currently recommend the LSST Slack #dm-pdac channel for everyday communications.  Please subscribe to this channel.  We will use "@channel" notifications when making service changes.

A rudimentary automated monitoring service based on Nagios has been set up and will regularly be extended in what it tracks.  Alerts from this service are directed to the #dm-pdac-nagios channel on Slack.  We recommend that you subscribe to this channel as well.  Later on we hope to make a "dashboard" web page with system status available.

We may soon be setting up a JIRA project specifically for PDAC user issue reporting.  If/when that becomes available, it will be announced in #dm-pdac.

PDAC team meetings are held every other Thursday at 09:00 Pacific time, in the phase of , on https://bluejeans.com/383721668.  Users are encouraged to attend.

Caveats

There are significant caveats associated with the PDACv1 deployment.

  • There is no operational staffing in DM at this time.  PDAC is supported by its developers during their working hours.  (Some lower-level sysadmin work is supported by NCSA internal operations staff, in response to the developers.)
  • New features and bug fixes will be deployed on short notice.  The #dm-pdac channel on LSST Slack is the only mechanism that will be used to announce outages or restarts.
  • The underlying "LSST integration environment" at NCSA is itself continually under construction and being refreshed so service may be intermittent, though again we do our best to provide notice.
  • During the deployment of PDACv1, a substantial fraction of the single-epoch calibrated exposures that were generated in the Summer 2013 processing were found to have been irrevocably lost at some point in the intervening years.  The corresponding single-epoch photometry is available, and the available coadds reflect the inclusion of these exposures, but they are not available for viewing.
  • The dataset is based on processing done with DM code that is nearly four years out of date.  There is no general guarantee that current DM stack code is backwardly compatible with this data.  Spot tests show that a variety of things work correctly with the recent v12 release (in particular, it appears that the v12 stack can still do forced PSF photometry on the calibrated images, though this has only been tested at the "does it crash?" level), but it is known that there will be problems with v13.
  • Facilities for resource management and monitoring are at best rudimentary.  It is relatively easy to submit a large number of long-running queries and lose sight of the impact this has on the back-end database resources.  We strongly encourage users to interact with us (on the Slack #dm-pdac channel) if they see unusual behavior or the system becomes unresponsive.
  • Queries returning very large result sets are not yet realistic given the absence of a "user workspace" for storing the results.  The current interfaces are unsuitable for "just give me the whole dataset" queries.
  • Full-table-scan queries on the forced photometry data are currently quite slow.  The Database group is interested in understanding the performance of the query optimizer on large, realistic queries, so we do encourage experimentation and prompt reporting of the results.
  • No resources are available to attempt to correct any issues with the numerical content of the data.  This is a frozen dataset.  There are no current plans to reprocess the SDSS data with more recent versions of the stack.
  • The forced photometry data is uncorrected for zero points.  Corrected fluxes and magnitudes are available via a join with the single-epoch image metadata table (using the column Science_Ccd_Exposure.fluxMag0) and the use of the appropriate sciSQL functions (scisql_dnToFlux[Sigma] and scisql_dnToAbMag[Sigma]).  The SUIT light-curve viewer will apply this correction, and also provide corrected uncertainties on the measurements.
  • Virtually all the effort in this cycle has gone into the low-level engineering of putting all the pieces together in the NCSA environment, and not on refining the user experience (UX) or on customizing the presentation to the particular properties of the Stripe 82 dataset.

With all of these in mind, please do know that the entire PDAC team will be grateful for the feedback and operational experience your use of the system provides us.

Temporarily incomplete features

There are several features that the SUIT group are still working on for "PDACv1.1", as it were, over the next 4-8 weeks:

  • The PDAC Firefly online help is in the process of being adapted from its IRSA versions.  Much of a first pass at this should be done by  .
  • The Firefly UI supports non-positional, "all sky" searches (and these are already available at IRSA); however, the interfacing required to map these onto the DAX services is still being completed.
  • A new Firefly light-curve viewer, incorporating phase-folding and access to the IRSA periodogram service, is still being completed.
  • We will also be linking available Summer-2013 processing documentation to this page as time permits.

Things that are not there

Many expected features of the LSST data model, services, and science user interface are not yet available in PDAC.  Some highlights:

  • Data
    • There is no analog to the Alert Production / L1 data products other than the existence of calibrated single-epoch images.
    • There is no analog to the DRP "Source" table - no single-epoch detections were run.
    • No Multifit processing was performed.
    • No association processing was performed.  The only inter-table linkages are between the i-band coadded photometry (Object-like table) and the forced photometry in all five bands, and the links from measurement tables to the appropriate image metadata records.
    • The calibration data products (e.g., flats) that were used for the processing are not available in PDACv1.
    • No coverage or depth maps are available.
    • No RGB images or hierarchical all-sky images were created.
    • No panchromatic coadds or high-resolution subtraction templates were generated.
    • No provenance data, other than links from measurements to images, are available.
  • Services
    • No asynchronous query service is available at the Data Access layer, so the user interface has little ability to let the user manage long-running queries.
    • No "next-to-the-database" processing layer is yet available, so there is no provision for any kind of distributed computational afterburner on the results of a distributed (Qserv) query.
    • Neither the database nor the Data Access layer support the creation of user databases, either in the conventional MariaDB system or the Qserv system.  (I.e., no support for "Level 3 data products".)
    • There is no LSST-provided Python interface to the Data Access REST APIs, and in particular, no means are provided for the recreation of the original afw tabular data model from the results of queries.
    • No per-user authorization capabilities are available for the Data Access services.
    • PDACv1 does not contain user-processing nodes (e.g., for running Jupyter/IPython notebooks "close to" the data).
    • The DAX dbserv REST API currently only supports returning query results in JSON or VOTable (TABLEDATA - i.e., pure XML - serialization only) formats.  More space-efficient formats will be supported in the future.
  • User Interface
    • No user customizability of the portal interface is yet provided.
    • The system does not yet provide for user logins (which are also not supported at the Data Access layer yet) and therefore does not support any persistent association of state with a user.
    • We do not currently have the ability for a user to treat the whole of Stripe 82 as a single (coadded) image that they can explore down to the single-pixel level by panning and zooming.  In other words, no all-sky viewer is available.
    • The context ("zoomed out", or "coverage" in IRSA terminology) images showing the locations of images that are selected in PDAC are currently drawn from a variety of IRSA (i.e., primarily infrared) datasets, and they are not always well-suited to understanding the Stripe 82 dataset.  We have not attempted to optimize this at all yet.