Discussion items

30 minNCSA cluster monitoring plans
  • Looking for alternatives that are less limited than Nagios
  • Considered Ganglia. Liked its aggregation capabilities.
  • Now focusing on Grafana (graphics) + InfluxDB (time series database) + collectd (per-node data collector)
    • InfluxDB seems more performant than ElasticSearch
    • Demo of a variety of features for defining graphs and graph layouts / dashboards
    • Currently on internal NCSA network, lsst-monitor01. Requires login, so should be able to expose more widely after security review.
    • Currently monitoring verification cluster. Next: Nebula. Discussion of extending to Qserv cluster.
10 minPDAC development plan for F17Gregory Dubois-FelsmannScience Platform and PDAC development and deployment in 2017
10 minDedicated data file system for the PDAC Qserv masterIgor GaponenkoRecent incidents: IHS-299 and IHS-276
10 minStatus of accessing auth/auth system from SUIT/Firefly
  • Have been able to connect successfully to CILogon
  • Need user "role" data to be passed back
  • Need ability to log out
  • Need Apache SSL certificate for the SUI-proxy machine


Simon Krughoff: is it possible to get more regular NCSA monitoring of the #dm-infrastructure Slack channel for question-answering? (Not for formal incident response - that will go to tickets.)

Action items