As-is Services
Incidents
Created & Resolved:
Key | Short Description | Summary | Status |
---|---|---|---|
IHS-748 | GPFS fileserver box CPU overloaded due to vendor cron jobs run amok. Fixed in < 1hr. The fix is permanent. | RESOLVED |
Resolved:
Key | Short Description | Summary |
---|---|---|
IHS-748 | GPFS fileserver box CPU overloaded due to vendor cron jobs run amok. Fixed in < 1hr. The fix is permanent. |
Requests
Created:
Key | Short Description | Summary | Status |
---|---|---|---|
IHS-749 | Need for dedicated batch compute capacity for upcoming data production runs. Implementation of multiple queues in the batch compute environment will satisfy the need (IHS-612). | IN PROG. | |
IHS-752 | DONE | ||
IHS-755 | Hsin-Fang Chiang added new calibration data per RFC-440 | DONE | |
IHS-760 | Firefly on the lsst-demo server needs access to /project files. | DONE |
Resolved:
Key | Short Description | Summary |
---|---|---|
IHS-714 | Removed numerous and confusing references to support email addresses and replaced with 2 persistent links to Jira IHS ticket creation: | |
IHS-717 | Andres Villalobos (sysadmin, LaSerena) requested assistance setting up an OpenStack cluster in Chile. We have arranged for a student with expertise in this area, Antonio Abinader, to help him. | |
IHS-720 | Done | |
IHS-752 | Done | |
IHS-755 | Done | |
IHS-760 | Mounted NFS readonly for rile system security reasons. |
Change Management
This process primarily targets requests that can be handled with current level of effort (LOE) resources. This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.
Successful changes proceed through 5 stages:
1 | Business Case & T/CAM Concurrence | Check that the submitter has stated a plausible business case and the relevant T/CAM agrees |
2 | Feasibility | Is the change well-formulated, address a project need and |
3 | Planning | A detailed implementation plan is created which takes into account impacts, resource needs, testing and verification. |
4 | Insertion | The plan is executed to implement the change. |
5 | Assessment | Verification of successful change, issues analysis, documentation and close-out. |
Open Change Requests
Key | Summary | Process Stage† | Reporter | P | Created | Status |
---|---|---|---|---|---|---|
IHS-612 | Implement debug and normal queues for developers on the verification cluster | Complete | Yusra AlSayyad | 16/Nov/17 | DONE | |
IHS-580 | DM developers need a build/test environment that supports docker containers | Feasibility | Joshua Hoblitt | 02/Nov/17 | ||
IHS-576 | Planning | Tim Morton | 02/Nov/17 | Assessing the impact to other use cases | ||
IHS-488 | Feasibility | John Gates | 04/Oct/17 | Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback. |
Heard on the Street This Week, but no Ticket Filed
New
Gregory expressed a need for additional PDAC compute capacity to support Science Platform development. When he finishes the needs assessment, he'll file a request.
Previous
It was suggested that per-user storage usage for each shared fileset be made available. Preferably readable by any DM member.
- Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
- Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack)
- Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
- Tools for parallel programming in batch computing environment (gnu parallel and others)
Problem Management
Report format under development
Interactions
T/CAM interactions
- Next meeting
- I've filed ITRFC-11 Propose Changes to the LSST account/access request instructions to bring the instructions in line with our process.
- Last PDAC meeting 11/16/2017.
- Only item of note is the possibility of a request for additional PDAC compute capacity.
- This working group has been rebooted. Next meeting
- No significant outstanding topics
- Next meeting
- Will brief the group on changes to the doc's and the incident, request and change process
Other business
(None)
Tasks
- Unknown User (pdomagala), figure out allocation notice process
- Unknown User (pdomagala), meet with Mike P. to discuss Chilean SLA
- Unknown User (pdomagala), create commentary on ICI change process w/ LSST roles & testing expectations specified. Define relationship to verification, release & early lifetime support.
- Unknown User (pdomagala), re. incident process & tickets, need to build schema that facilitates easy assimilation of service metrics