Date:
Attendees: Unknown User (pdomagala), Margaret Gelman, Donald Petravick, Joel Plutchak
NOTE: statistics cover the past 2 weeks
Scheduled Maintenance
See the LSST Service Status Page
- Note that the December maintenance window is moved from 12/21 to 12/14.
- Nebula will not be affected. The L1 test stand would have had OS upgrades early Thurs. morning, but I've asked the admin's to hold off until Jan. in order to provide maximum stability during the early integration excercise.
- 2017-12-22 to 2018-1-01 (inclusive) is a University holiday period. Services will be operational. NCSA will respond to incidents based on business criticality. During the holidays, incidents should be submitted as JIRA IHS tickets as we will not be monitoring slack channels as much as normal.
As-is Services
Incidents
1 created, 1 resolved.
Created & Resolved:
IHS-632 | Igor Gaponenko |
|
Discussion of Notable Issues
Unexpected reboot of lsst-qserv-db16 (IHS-606).
Firmware upgrades were successful and all nodes were returned to service Thur. 11/30. Since then, there have been no unplanned reboots.
Requests
4 created, 3 resolved3 resolved
IHS-631 | HTTP 301 redirect https://lsst-web.ncsa.illinois.edu/doxygen/(.*) -> http://doxygen.lsst.codes/stack/doxygen/$1 | Joshua Hoblitt |
| ||||||
IHS-638 | Install a tex distro on lsst-dev | Merlin Fisher-Levine |
| ||||||
IHS-632 | Various problems with NEBULA instances lsst-gapon-qserv-* | Igor Gaponenko |
| ||||||
IHS-654 | Christine Banek |
|
Change Management
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Changes proceed through 5 stages:
1 | Initial Assessment | Check that the submitter has stated a plausible business case and the relevant T/CAM agrees |
2 | Feasibility Assessment | Is the change well-formulated, address a project need and cost-effective. |
3 | Planning | A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification. |
4 | Implementation | The plan is executed to implement the change. |
5 | Assessment | Verification of successful change & issues analysis |
6 | Closed | Documentation and formally close the request. close-out. |
Open Change Requests
Key | Summary | Process Stage† | Reporter | P | Created | Status |
---|---|---|---|---|---|---|
IHS-576 | Planning | Tim Morton | 02/Nov/17 | This change has been approved and is tentatively scheduled for early CY18. | ||
IHS-580 | DM developers need a build/test environment that supports docker containers | Planning | Joshua Hoblitt | 02/Nov/17 | Use case and requirements are being gathered. Unknown User (pdomagala) will discuss this with the PDAC working group this Thursday. | |
IHS-612 | Implement debug and normal queues for developers on the verification cluster | Planning | Yusra AlSayyad | 16/Nov/17 | Currently being planned and tested. Tentatively scheduled for deployment before 22/Dec/17 | |
IHS-638 | Closed | Merlin Fisher-Levine | 04/Dec/17 | Approved by M. Butler and completed on 08/Dec/17 |
Heard on the Street This Week, but no Ticket Filed
New
It was suggested that per-user storage usage for each shared fileset be made available. Preferably readable by any DM member.
Previous
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Change Process Notes
- Change process is being refined based on experience and feedback from exercising it over the past month.
Problem Management
Report format under development
Interactions
Protracted discussion with John Swinbank, Kian-Tat Lim & Simon Krughoff re. the need to have retrievability of items purged from the GPFS /scratch partition for some short period of time. See
Jira server JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 9da94fb6-5771-303d-a785-1b6c5ab0f2d2 key IHS-613
Unknown User (jmatt) proposes in ITRFC-10 that the ITSC consider making recommendations for container and container orchestration best practices. It's unclear if
Next meeting
Last PDAC meeting
- Major topics of discussion were need to finish this
Next PDAC meeting
Suspended until after the first of the year since Jeff is in Chile. However, I’m linked in to Chile IT.
The
meeting was focused on the upcoming maintenance event.
Other business
Proposed that we install a standard Influx/telegraf/prometheus stack on the standard Nebula images. Install a monitoring system in openstack to serve up the data/dashboards.
Action Items
New
- Unknown User (pdomagala), get cell phone numbers
- Unknown User (pdomagala), contact Gregory Dubois-Felsmann and/or Unknown User (xiuqin) for approval (phone)
- Unknown User (pdomagala), start monitoring the RFC project
- Unknown User (pdomagala), interface between RFC process and LDMCR process
From last week