Scheduled Maintenance
Next lsst-dev maintenance event is Thurs. Feb. 14. Notable is that fact that fixes for the spectre and meltdown vulnerabilities may be tested and released (and not retracted) by then.
See the LSST Service Status Page for details
As-is Services
(since 12/20/2017)
Incidents
6 created, 5 resolved.
Created:
Resolved:
IHS-703 | |
IHS-695 | Jobs stuck with "ReqNodeNotAvail" despite there being more than enough nodes available |
IHS-687 | |
IHS-673 | |
IHS-672 |
Discussion of Notable Issues
With regard to IHS-703, root cause was determined to be network congestion. It has been addressed by network tuning, but architectural changes are warranted and will be discussed internally.
Requests
- 6 created, 3 resolved
Created:
IHS-720 | |
IHS-717 | |
IHS-714 | |
IHS-708 | Missing libGL.so.1 on the compute nodes of the Verification Cluster |
IHS-699 | |
IHS-684 |
Resolved:
Change Management
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Successful changes proceed through 5 stages:
1 | Business Case & T/CAM Concurrence | Check that the submitter has stated a plausible business case and the relevant T/CAM agrees |
2 | Feasibility | Is the change well-formulated, address a project need and |
3 | Planning | A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification. |
4 | Insertion | The plan is executed to implement the change. |
5 | Assessment | Verification of successful change, issues analysis, documentation and close-out. |
Open Change Requests
Key | Summary | Process Stage† | Reporter | P | Created | Status |
---|---|---|---|---|---|---|
IHS-612 | Implement debug and normal queues for developers on the verification cluster | Planning | Yusra AlSayyad | 16/Nov/17 | Anticipate implementation this week | |
IHS-580 | DM developers need a build/test environment that supports docker containers | Feasibility | Joshua Hoblitt | 02/Nov/17 | ||
IHS-576 | Planning | Tim Morton | 02/Nov/17 | Assessing the impact to other use cases | ||
IHS-488 | Feasibility | John Gates | 04/Oct/17 | Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback. |
Heard on the Street This Week, but no Ticket Filed
New
None
Previous
It was suggested that per-user storage usage for each shared fileset be made available. Preferably readable by any DM member.
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Change Process Notes
- Change process is being further developed
- Change process is being exercised, refined and socialized with T/CAMs as well as submitters
Problem Management
Report format under development
Interactions
Mainly Slack discussions regarding infrastructure operational details (e.g., patching, system status, etc)
The following RFCs have been adopted:
Notable is
which is also a project (EVMS) deliverable. It's unclear how ITSC and project objectives are related, but there has been no movement on this within the ITSC.
Next meeting
All systems operational. The team continues work on milestone testing for the F17 "load and serve the WISE single-epoch data" effort.
Next PDAC meeting
Still suspended as implementation has taken precedence over planning. Pinged Jeff Kantor yesterday on this and he'll be rebooting meetings
Patches to user-facing lsst-dev servers for spectre and meltdown vulnerabilities were applied. The assessment of the sysadmin & security teams is that remaining systems are not in any immediate peril. They will wait for patches that are tested and supplied through official vendor channels.
Next meeting
Other business
(None)
Action Items
New
From last week