Infrastructure meetings take place every other Thurs. at 9:00 Pacific on the BlueJeans infrastructure-meeting channel: https://bluejeans.com/383721668
Date 31 Aug 2017
Goals Alignment of NCSA-provided services with program needs Ensure effective use of the current NCSA infrastructure Refinement and continuous improvement of services, resources and processes Plan for near- and medium-term activities
Topic Who Notes Review of last meeting notes Status updates
See below Open tickets & outstanding issues Ramping up support levels
(time permitting)
Unknown User (pdomagala) Project needsNeed to involve science teams, developers Need to build an "experts list" & call trees Support level & schedulesduring observing, expect 24x7 on-call for observing critical Timeframe to implement Tools PDAC Status Gregory Dubois-Felsmann Need kickoff meeting for AA, every Firefly load balancer security assessment -
IHS-388
-
Getting issue details...
STATUS
Cross-system "early integration" meeting at NCSA, Week of Oct. 9, need:ops support Work with Tony Johnson to deploy a pupettized server Topics for next meeting
Status of LSST Infrastructure Projects Disaster Recovery verification Andrew Loftus NCSA 3003 Refresh Bill Glick PROJECTS Chile AA Deployment Temporary setup in NCSA 3003 'SET' rack A18 Waiting on... Kay Avila started setting up pfsense applianceNeed:
28 Aug 2017
L1 Cluster (40+ nodes) 28 Aug 2017
Doug Fein, order by Sept. 30, pending proj. office sign-off L1 Orchestration Test Stand Waiting for hardware This is also in Outstanding Orders. Does it need to be here also? What is difference between "L1 Cluster" and "L1 Test Stand"?Orchestration might deploy prompt processing payload L1 Complete Test Stand=DAQ+all the messaging & forwarding s/w Outstanding Orders Michelle Butler Qserv-master -
IHS-378
-
Getting issue details...
STATUS
Michelle Butler status?Paul Domagala needs to get this unstuckUPDATE: It seems that I have unfairly blamed AURA when, in fact, the order is still stuck at NCSA. Silly of me to assume that when I was told a couple weeks ago that the order would go out by the end of the week, that it actually would. I'm working on getting it unstuck. Deployment test nodes (4 lenovo, 2 Dell (1 chassis) ) lsst-db (replacement for current host) (Dell R740)Pending project finalization lsst-dbdev (replacements for systems in 3003) L1 Orchestration Test Stand
Container Management https://github.com/lsst/LDM-564/tree/tickets/DM-11468 TechnologyDocker + Kubernetes? Other? Need timelines and priorities Potential Use CasesSUIT jupyterhub Qserv & DAC services verification alert distribution systems squash L1/L2 QC developer services (docs, jenkins, ........) PossiblesBulk data distribution many, many micro-services (e.g. monitoring, pointing prediction service, TBD) Do we need to revisit object stores? The infra team (and others) believe so. Provisioning Goal to combine hardware provisioning systems for NCSA 3003 with that in NPCF.
Possible technologies:
OS Deployment Sofwtare Package Repository Management Oracle Testing Michelle Butler Puppet Baseline for LSST project-wide Nebula Monitoring Harathi Korrapati Working on integrating nebula instance to monitor01 (facing some issues and working on it) Monitoring sites: Cluster Monitoring Harathi Korrapati
Action items Please enter action items in the form
Responsible Person, Due Date, Description