Deployment

Versions for Kubernetes/Docker/other components suggested based on the versions at IN2P3

Helm seems to depend on at least Kubernetes 1.8, which is later than IN2P3; jupyterlab-demo is using 1.8.3 (kubectl 1.8.4); we will use that since Qserv wants to upgrade anyway

  • Should try to share technical stack (e.g. Helm vs. Terraform)

Overlay network must support multicast for xrootd in Qserv

NCSA is responsible for the Kubernetes commons installation, as well as the PDAC infrastructure (hardware, installation, common image registry/storage); other administration of PDAC is shared

  • Cluster definition unclear right now
  • Pod configuration to both developers and NCSA

Expertise will be shared initially (e.g. in #dm-kubernetes), but someone from LDF will need to take over coordination eventually

Kubernetes configuration version control needs to be defined and standardized

Definition of entire LSP configuration may differ slightly from concatenation of sandboxes for each Aspect (e.g. sharing an overall load balancer); LDF will eventually own this for operational deployments

LDF needs to audit configurations for security purposes

When do Aspects need to be running in PDAC?

  • Qserv goal is end of cycle in May; aspirationally have all of LSP be able to do coordinated releases by then, so have a rough version in Feb/Mar

All Aspects are already using DockerHub for container storage

  • Steve Pietrowicz to check if the local image registry is part of the Kubernetes commons hardware order
  • Stack images are 4 GB uncompressed, 1 GB compressed
  • Doubles when JupyterHub added, about 3 GB for each image
  • Built nightly, aggressively cleaned up for now
  • Running in 50 GB today
  • 500 GB should be sufficient, including CI-built containers
  • This was not provisioned for as part of the k8s commons hardware order, and an investigation needs to be done about the best way to implement the storage for this.  This could be networked mounted storage (which would be more flexible), or local (and somewhat problematic if nodes go down or pods switch nodes).

Want to have CI build containers, push, deploy, and test

  • Stack should be a third-party package for DAX
  • Unclear if resources can be freed up to separately build Qserv and webserv APIs from the rest of the Stack
  • Jenkins used to build Firefly, pushes and deploys onto IPAC environment; testing does not include firefly_client yet; soon containers will be automatically built; would like LSST CI to build it
  • Jenkins performance is an issue
    • Jenkins py2 nodes can be recovered after 15.0 release
    • Adding workers at NCSA might be a possibility; some advantages to doing so in terms of close links between build and deploy, minimizing firewall and data access issues; this is in SQuaRE plans already (after improvements for Science Pipelines are complete but still in S18, initially on Nebula but later on Kubernetes commons)
  • Carving out a part of existing Jenkins might be better than setting up separate ones, but still need to distribute the load to more than just Josh
  • LSP overall CI, not just at Aspect level
    • Don't currently have dbserv/qserv integration tests; want to build them; then scripted Firefly talking to dbserv talking to qserv
    • Need someone designated as LSP QA engineer to be responsible for performing deployment according to recipe and executing tests, both automated and manual; release manager not likely to have sufficient bandwidth to perform all of this
  • Aiming to get concrete proposal by the end of the week

Portal plugins will need separate containers; needs to be figured out

Need to figure out how to have multiple versions of components running at the same time

Need to have monitoring tools installed and running as well: deploy at least Kubernetes Dashboard; Fritz will do it for now; Matias has done it as well for other NCSA deployments

All logs from all pods need to be collected

  • jupyterlab-demo has a daemonset (one pod on each node) that collects docker logs and ships to ELK at NCSA; Adam/JMatt will make sure something similar is available for PDAC; will need to coordinate with Paul Domagala who is expecting to be responsible for log collection and analysis

It would be helpful for DAX if we could build an afw container without the rest of lsst_distrib; Qserv depends on only a few low-level packages

Future Datasets

WISE is loaded including NeoWISE year 1

  • Years 2 and 3 make the data more interesting for science (increases proper motion baseline)
  • Low priority without strongly-motivated use case

HSC will be next; mostly wide field, deep fields are not useful (small number of epochs); no more data until 2019

ZTF is proprietary, but will probably be needed to practice Alerts and provides data with many epochs

DECam data could be used for many epochs; up to Eric Bellm to decide if it is useful for AP

DESC DC2 ImSim data (expected June 2018) will have deep fields with many epochs; DM-SST needs to think about how this could be used; deciding on this at March DM All Hands is OK

  • 50000 visits, 2/3 Wide/Fast/Deep on 300 sq deg, 1/3 simulated deep drilling on 10 sq deg, all bands
  • For functional testing, don't necessarily need CatSim "truth" catalog
  • Frossie Economou will look for David Nidever-collected list of requests for simulated data and provide to GPDF

Gaia useful; DR2 will have proper motions; only ~300 GB; should be done

Which datasets are loaded into Science Validation instance is also a question

High Level Tests

After Kubernetes, turn over to science user toward the end of S18

In early 2019, a science user could use above datasets, e.g. someone from DESC to work with DESC DC2 data

In the future, we should write responses to test reports

Next-to-Database Processing

Looking at ways to provide an Apache Spark-type analysis framework to process catalog data

Not just shard-level processing; framework also needs to provide capability for merge/reduce at end

Constraint: can't upset SQL queries, need to share resources

  • Could have another replica in a different format, but would likely require more than two copies
  • Or could have all data stored in a shared format

Could also be just for Object?

  • ObjectExtra, Source, ForcedSource are the most interesting for non-SQL analysis

Originally this feature was to be provided towards the end of development; now need to at least define the feature sooner

Real-time streaming out of the database may limit how fast Qserv can run; I/O of intermediate products may also slow down Qserv

Discussions between John and Colin suggested streaming chunks out of a query to a cluster of Nebula/OpenStack machines perhaps not pre-joined or perhaps pre-joined into a non-SQL result; could use Kafka between the two to handle flow control

Time series analysis is a classic use case; making an all-sky map can be done using GROUP BY, but varying projections or unusual aggregation functions or requiring joins to do reddening could be an issue

Pre-selection in SQL is desirable; this would reduce the number of columns significantly but would likely not reduce the number of rows very much; this may still cause a problem with Qserv which is row-based

Users must understand that they are writing parallel code, but Spark helps

Next-to-database processing will need access to user Workspace, both files and tables

Shared scans benefit from row-orientation

  • Leveraging shared scans for next-to-database processing could help performance

Requirements and use cases should be exposed now to help direct research over the next 6 months

Suggestions to remove ForcedSource from Qserv would have many other effects (e.g. on Portal) that need to be considered

HEALPix is valuable to have as a spatial index in the database when regions can be pre-defined in HEALPix; should be added to DPDD

Unauthenticated Access

Even users with credentials used unauthenticated access for SDSS

  • Could resolve by making authenticated access simpler and faster
  • Password managers are more prevalent nowadays

Could have "bucket" shared users (permanent or temporary) to enable policy exceptions

Resource Management

Group quotas for storage can be handled by having separate group filesystems

Groups could perhaps be modeled like GitHub organizations; groups as "no-login" users?

  • Owners and members; group and group-admin group

Batch systems typically allow the submitter to choose which quota to

NCSA needs to think about what they want to build

API Aspect

Question about whether Portal can connect to multiple data sources?

  • No, Portal connects only to local data source; Commissioning Cluster may be a special case that can also connect to Observatory Ops Data Service

Question about what Web APIs the Web API Aspect can support

  • Is it just the defined APIs (dbserv, imgserv, SODA, VOSpace, etc.)? (Yes) Or a home for any other API that people would like to provide? (Not at present)
    • It has only been planned for the defined APIs with the webserv proxy as a front end
    • Set of data APIs provided to public Internet from the DACs (which also happen to be used by staff from internal instances)
    • Web API Aspect could accept additional services; would require some auditing of Auth/Auth and deployment
  • One master endpoint address per instance

Batch Processing

Model for using batch from Notebook; possibilities include:

  • Command line service (current minimal baseline is only this, using HTCondor)
  • Python API within cells (including workflow engine?)
  • Framework to execute entire notebooks in Batch (including workflow engine?)
  • UWS could manage batch jobs

API Aspect and possibly Portal could also submit batch jobs

Many non-batch processing models in scientific Python ecosystem (e.g. Dask)

  • Need to be able to allocate nodes to support these
  • Should be integrated with next-to-database research work
  • Kubernetes as the way of doing this? Request a "bigger pod" to do the same task
    • Data I/O problems, difficult to do caching
    • Could have shared disk?
  • Is there a way to request multiple nodes through HTCondor that can talk to the submitting process? This would support Dask
  • HTCondor has a Docker universe that can execute containers in batch

How does user know that what they are trying to do is too big for their Notebook pod?

Architecture Diagrams

Each team drew up a pod-level diagram of their Aspect and its interfaces

We also modified the top level diagram

  • Kian-Tat Lim will clean them up and merge them together using a consistent symbology

Other comments:

  • Need to add monitoring/ELK service to top diagram
  • Add internal filebeat for each pod from shared container 
  • Add local container registry to top diagram
  • Each aspect has its own logstash from shared container
  • Should have one load balancer for each Aspect on different domains (usdac.lsst.org, api.usdac.lsst.org, notebook.usdac.lsst.org)
  • Portal Stack microservices need to talk to data sources
    • Could share containers with JupyterLab? Possibly, but could be difficult
    • Should be able to have Portal work when Notebook is down
    • Develop algorithmic work in notebook, configure it to be run as UI action in Portal stack microservice
    • Could use Jupyter spawner to start microservices
  • Need to make sure that Consolidated DB can accept authentications from services