Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.




Scheduled Maintenance

See the LSST Service Status Page

Support over the holiday break: 

LSST Development Servers

break  to  

Services will be operational. The following have been identified as essential during this period:
lsst-dev
lsst-xfer
Verification Cluster (slurm)
Nagios monitoring
GPFS Storage
Please report problems via the JIRA IHS queue.

As-is Services

Incidents

    • 1 created, 2 resolved.

      • Created
      • Resolved:


Discussion of Notable Issues

    • Jira
      serverJIRA
      serverId9da94fb6-5771-303d-a785-1b6c5ab0f2d2
      keyIHS-663

Maintenance event of ran into problems.  Mitigations were put in place and systems were operational at 19:00.  Root cause was determined to be policy-based-routing (networking). Systems & network engineering are doing an analysis, identifying lessons learned and formulating changes to prevent recurrence.


    • Jira
      serverJIRA
      serverId9da94fb6-5771-303d-a785-1b6c5ab0f2d2
      keyIHS-606

Users report no problems.  There have been no spurious reboots detected.  The issue has been closed with Igor Gaponenko's concurrence.

Requests

    • 1 created, 1 resolved

Created

Resolved



Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Successful changes proceed through 5 stages: 

1

Business Case & T/CAM ConcurrenceCheck that the submitter has stated a plausible business case and the relevant T/CAM agrees
2FeasibilityIs the change well-formulated, address a project need and
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4InsertionThe plan is executed to implement the change.
5AssessmentVerification of successful change, issues analysis, documentation and close-out.


Open Change Requests


Key  SummaryProcess Stage†ReporterPCreatedStatus
IHS-576

Configure slurm to accept jobs to use only partial nodes

Planning

Tim MortonMajor02/Nov/17This change has been approved and is tentatively scheduled for early CY18.
IHS-580

DM developers need a build/test environment that supports docker containers

Feasibility

Joshua HoblittMinor02/Nov/17

Use case and requirements are being gathered. Unknown User (pdomagala) with the PDAC working group last week. They requested further detail on the specifications and timing of the FY18 kubernetes/docker initiative. If that service, even an early implementation, is ready near-term, the need for stop-gap measures may be unnecessary.

IHS-612

Implement debug and normal queues for developers on the verification cluster

Planning

Yusra AlSayyadMajor16/Nov/17Currently being planned and tested. Tentatively scheduled for early CY18.
RFC-423Allow ssh access to verification cluster worker nodesFeasibilitySimon Krughoff
18/Dec/17Proposed. 5 days left on comment period.

Heard on the Street This Week, but no Ticket Filed

  • New

    • None

  • Previous

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Change Process Notes

Problem Management

Report format under development

Interactions

  1. T/CAM interactions
    1. Discussion with Unknown User (xiuqin)Gregory Dubois-Felsmann on Unknown User (cbanek)'s request for PDAC access, 
      Jira
      serverJIRA
      serverId9da94fb6-5771-303d-a785-1b6c5ab0f2d2
      keyIHS-654
      .  Both approved and noted so in the ticket.  Access was given and the issue was closed.
    2. Engaged all managers on activities over the holiday break that might need support.
  2. ITSC
    1. Last meeting 

  3.  
    1. . Nothing significant.

  4. PDAC

    1. Last PDAC meeting 11/16/2017

    2. Discussion on Image ModifiedIHS-488 - Increase limits and swap space for qserv pdac: It's unclear what the use cases are and what the specific requirements might be.  Unknown User (pdomagala) created a subtask for Fritz Mueller to look into this before proceeding.

  5. Summit-base Tiger Team

    1. Suspended until after the first of the year since Jeff is in Chile.  However, Unknown User (pdomagala) has been included in the North-South

  6. ITsupport
    1. IT support meetings.

  7. Infrastructure

    1. Last meeting Nothing significant.

Other

  1.   North

    North-South IT

    support 

    Support

    1. Unknown User (pdomagala) has been included in the North-South IT support meetings.

Other business

(None)

Action Items

New


From last week