Date

Attendees

Goals

  • Initial meeting to share information and expertise with Kubernetes as input into planning details of the upcoming LSST Data Facility Kubernetes installation.

Discussion items

ItemWhoNotes
Hardware Status
  • Hardware is physically installed
  • Wiring up networking and power Feb 9
  • Initial installation week of Feb 12-16
  • 20 nodes each with:
    • 2x 16core 2.1GHz CPU

    • 192GB RAM

    • 2x 1.2TB 10K RPM SAS 12Gbps

Science Platform

'Aspects' of 'Science Platform'

  1. api
    • qserv and databases - affinity
    • web restful services - ingress controllers for external visibility
  2. portal - user web applications
    • sui tools
    • firefly
  3. notebook - jupyterhub
    • python computing environment for users
    • access point to batch computing (no batch available yet)
    • more elastic than others (based on current users)
    • requires ingress controllers for external visibility
      • actually all 3 aspects need ingress

5 Official Instances of 'Science Platform'

  1. integration cluster - PDAC

  2. science validation - timing?

    • inwardly facing

    • testing with large datasets

  3. commissioning cluster in La Serena

  4. chilean DAC

  5. US DAC

Possible Development Work Areas for Kubernetes

  • general development
  • prerelease deployment testing
Access
  • namespaces
    • match up with science platform aspects or deployment
    • understand which namespaces need to talk to each other
  • roles
    • 'admin' access to a namespace
  • access
    • run kubectl command from lsst-bastion
    • future - may need limited sudo access to run specific commands - wait till obvious
  • web dashboard is useful occasionally
  • qserv - unique needs
    • need OS admin access to move their data around within the bare metal
Log File Aggregation
  • logging of docker/pod logs (not talking about baremetal node logs)
  • forward them to an ELK stack
  • possibly a feature in k8s in future
  • what does NCSA want?
    • should docker/pod logs be forwarded into centralized log analysis?
Local Container Registry
  • Local registry is critical for usability
  • ability to push images from LSST CI
  • is there a vetting process?
  • probably dedicated machine with lots of network connectivity, primarily to k8s cluster
  • consider nexus 3.0 software as registry software - Brian Van Klaveren
Shared Storage
  • local container registry cache
    • persistent user storage
      • SQRE thinking this is GPFS - is this fs0/home ?
      • in development, users would prefer all their normal gpfs mounts
    • frossie says they prefer NFS access to GPFS rather than native
    • possibly shared writeable space? - e.g. /project or /scratch
    • shared data (e.g. readonly copy of GPFS /datasets/ (via NFS))
    • for science users, users use VO space (on top of GPFS, etc)
  • local node storage
    • for cached images
User Mapping
  • mapping users to OAUTH in JupyterHub
    • currently CIlogin is not providing UIDs that match NCSA LDAP
      • this is a blocker for mounting GPFS filesystems
  •