Date:
Attendees: Unknown User (pdomagala), Margaret Gelman, Donald Petravick, Unknown User (kaylynr)
Scheduled Maintenance
See the LSST Service Status Page
Note that the December maintenance window is moved from 12/21 to 12/14.
As-is Services
Incidents
0 created, 0 resolved.
Discussion of Notable Issues
Unexpected reboot of lsst-qserv-db16 (IHS-606).
Spontaneous reboots of PDAC nodes has been an ongoing issue since Nov. 14. The last event was on Nov. 23. Datacenter infrastructure has been ruled out as a cause. The problem is determined to be with the servers themselves - likely on the system board.
Despite this issue, Igor Gaponenko was able to complete his data ingests and meet his Nov. 30 milestone.
On Mon. of this week, our engineering team and vendor tech support believe they have identified the likely cause and we've initiated an emergency change request. The fix entails firmware updates, which are currently being installed. Nodes will be unavailable as they are upgraded. This could be a lengthy process.
Requests
None created or resolved
Change Management
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Successful changes proceed through 5 stages:
1 | Business Case & T/CAM Concurrence | Check that the submitter has stated a plausible business case and the relevant T/CAM agrees |
2 | Feasibility | Is the change well-formulated, address a project need and |
3 | Planning | A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification. |
4 | Insertion | The plan is executed to implement the change. |
5 | Assessment | Verification of successful change, issues analysis, documentation and close-out. |
Open Change Requests
Key | Summary | Process Stage† | Reporter | P | Created | Resource track | Status |
---|---|---|---|---|---|---|---|
IHS-580 | DM developers need a build/test environment that supports docker containers | Feasibility | Joshua Hoblitt | 02/Nov/17 |
| ||
IHS-576 | Planning | Tim Morton | 02/Nov/17 |
| |||
Implement debug and normal queues for developers on the verification cluster | Planning | Yusra AlSayyad | 16/Nov/17 | LOE |
| ||
IHS-595 | Closed (inserted) | John Parejko | 08/Nov/17 | LOE |
| ||
IHS-613 | Closed (will not implement) | John Parejko | 16/Nov/17 | - |
|
Heard on the Street This Week, but no Ticket Filed
New
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
Previous
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Change Process Notes
- LDF Service Management Operations Meeting Notes are now posted on the LSST confluence site.
- Change process is being exercised, refined and socialized with T/CAMs as well as submitters
Problem Management
Report format under development
Interactions
None
Next meeting
Next PDAC meeting is tomorrow, .
Suspended until after the first of the year since Jeff is in Chile. However, I’m linked in to Chile IT.
Next meeting
Other business
(None)
Action Items
New
- Write some training .ppt doc's on service manager duties describe what needs to be done on a daily basis. 1st is daily duties.
- Define how the LDMCR closure process
- revise change process to include in phase 1 linking in T/CAM at discretion of the CM
- Document the change process, issue types, etc.....
From last week