Note that the December maintenance window is moved from 12/21 to 12/14.
0 created, 0 resolved.
Discussion of Notable Issues
Unexpected reboot of lsst-qserv-db16 (IHS-606).
Spontaneous reboots of PDAC nodes has been an ongoing issue since Nov. 14. The last event was on Nov. 23. Datacenter infrastructure has been ruled out as a cause. The problem is determined to be with the servers themselves - likely on the system board.
Despite this issue, Igor Gaponenko was able to complete his data ingests and meet his Nov. 30 milestone.
On Mon. of this week, our engineering team and vendor tech support believe they have identified the likely cause and we've initiated an emergency change request. The fix entails firmware updates, which are currently being installed. Nodes will be unavailable as they are upgraded. This could be a lengthy process.
None created or resolved
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Successful changes proceed through 5 stages:
|Business Case & T/CAM Concurrence||Check that the submitter has stated a plausible business case and the relevant T/CAM agrees|
|2||Feasibility||Is the change well-formulated, address a project need and|
|3||Planning||A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.|
|4||Insertion||The plan is executed to implement the change.|
|5||Assessment||Verification of successful change, issues analysis, documentation and close-out.|
Open Change Requests
|Key||Summary||Process Stage†||Reporter||P||Created||Resource track||Status|
(will not implement)
Heard on the Street This Week, but no Ticket Filed
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Change Process Notes
- LDF Service Management Operations Meeting Notes are now posted on the LSST confluence site.
- Change process is being exercised, refined and socialized with T/CAMs as well as submitters
Report format under development
- Write some training .ppt doc's on service manager duties describe what needs to be done on a daily basis. 1st is daily duties.
- Define how the LDMCR closure process
- revise change process to include in phase 1 linking in T/CAM at discretion of the CM
- Document the change process, issue types, etc.....
From last week