Scheduled Maintenance
Next lsst-dev maintenance event is Thurs. Feb. 14. Notable is that fact that fixes for the spectre and meltdown vulnerabilities may be tested and released (and not retracted) by then.
See the LSST Service Status Page for details
As-is Services
(since 12/20/2017)
Incidents
6 created, 5 resolved.
Created:
Resolved:
IHS-703 | |
IHS-695 | Jobs stuck with "ReqNodeNotAvail" despite there being more than enough nodes available |
IHS-687 | |
IHS-673 | |
IHS-672 |
Discussion of Notable Issues
With regard to IHS-703, root cause was determined to be network congestion. In addition to normal network loads, a sync of GPFS systems was being done in support of provisioning the new storage system. While the problem was easily solved, the take-away was that up-front effort in architecting these systems will result in long term effort savings, better performance and reliability. At next week's sysadmin meeting, Michelle B. will lead a brainstorming session on the topic.
Requests
- 6 created, 3 resolved
Created:
IHS-720 | |
IHS-717 | |
IHS-714 | |
IHS-708 | Missing libGL.so.1 on the compute nodes of the Verification Cluster |
IHS-699 | |
IHS-684 |
Resolved:
Change Management
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Successful changes proceed through 5 stages:
1 | Business Case & T/CAM Concurrence | Check that the submitter has stated a plausible business case and the relevant T/CAM agrees |
2 | Feasibility | Is the change well-formulated, address a project need and |
3 | Planning | A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification. |
4 | Insertion | The plan is executed to implement the change. |
5 | Assessment | Verification of successful change, issues analysis, documentation and close-out. |
Open Change Requests
Key | Summary | Process Stage† | Reporter | P | Created | Status |
---|---|---|---|---|---|---|
IHS-612 | Implement debug and normal queues for developers on the verification cluster | Planning | Yusra AlSayyad | 16/Nov/17 | Anticipate implementation this week | |
IHS-580 | DM developers need a build/test environment that supports docker containers | Feasibility | Joshua Hoblitt | 02/Nov/17 | ||
IHS-576 | Planning | Tim Morton | 02/Nov/17 | Assessing the impact to other use cases | ||
IHS-488 | Feasibility | John Gates | 04/Oct/17 | Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback. |
Heard on the Street This Week, but no Ticket Filed
New
None
Previous
It was suggested that per-user storage usage for each shared fileset be made available. Preferably readable by any DM member.
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Change Process Notes
- Change process is being further developed
- Change process is being exercised, refined and socialized with T/CAMs as well as submitters
Problem Management
Report format under development
Interactions
Mainly Slack discussions regarding infrastructure operational details (e.g., patching, system status,
The following RFCs have been adopted:
Notable is
which is also a significant project (EVMS) deliverable. It's unclear how ITSC and project objectives are related.
Next meeting
Last PDAC meeting 11/16/2017.
Still suspended as implementation has taken precedence over planning. Pinged Jeff Kantor yesterday on this and he'll be rebooting meetings
Next meeting
Other business
(None)
Action Items
New
From last week