Skip to end of metadata
Go to start of metadata

Scheduled Maintenance

See the LSST Service Status Page

Support over the holiday break  to  

Services will be operational. The following have been identified as essential during this period:
Verification Cluster (slurm)
GPFS Storage
Please report problems via the JIRA IHS queue.

As-is Services


    • 1 created, 2 resolved.

      • Created
      • Resolved:

Discussion of Notable Issues

    • IHS-663 - Getting issue details... STATUS

Maintenance event of ran into problems.  Mitigations were put in place and systems were operational at 19:00.  Root cause was determined to be policy-based-routing (networking). Systems & network engineering are doing an analysis, identifying lessons learned and formulating changes to prevent recurrence.

    • IHS-606 - Getting issue details... STATUS

Users report no problems.  There have been no spurious reboots detected.  The issue has been closed with Igor Gaponenko's concurrence.


    • 1 created, 1 resolved



Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Successful changes proceed through 5 stages: 


Business Case & T/CAM ConcurrenceCheck that the submitter has stated a plausible business case and the relevant T/CAM agrees
2FeasibilityIs the change well-formulated, address a project need and
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4InsertionThe plan is executed to implement the change.
5AssessmentVerification of successful change, issues analysis, documentation and close-out.

Open Change Requests

Key  SummaryProcess Stage†ReporterPCreatedStatus

Configure slurm to accept jobs to use only partial nodes


Tim MortonMajor02/Nov/17This change has been approved and is tentatively scheduled for early CY18.

DM developers need a build/test environment that supports docker containers


Joshua HoblittMinor02/Nov/17

Use case and requirements are being gathered. Unknown User (pdomagala) with the PDAC working group last week. They requested further detail on the specifications and timing of the FY18 kubernetes/docker initiative. If that service, even an early implementation, is ready near-term, the need for stop-gap measures may be unnecessary.


Implement debug and normal queues for developers on the verification cluster


Yusra AlSayyadMajor16/Nov/17Currently being planned and tested. Tentatively scheduled for early CY18.
RFC-423Allow ssh access to verification cluster worker nodesFeasibilitySimon Krughoff
18/Dec/17Proposed. 5 days left on comment period.

Heard on the Street This Week, but no Ticket Filed

  • New

    • None

  • Previous

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Change Process Notes

Problem Management

Report format under development


  1. T/CAM interactions
    1. Discussion with Unknown User (xiuqin)Gregory Dubois-Felsmann on Christine Banek's request for PDAC access,  IHS-654 - Getting issue details... STATUS .  Both approved and noted so in the ticket.  Access was given and the issue was closed.
    2. Engaged all managers on activities over the holiday break that might need support.
  2. ITSC
    1. Last meeting . Nothing significant.

  3. PDAC

    1. Last PDAC meeting 11/16/2017

    2. Discussion on IHS-488 - Increase limits and swap space for qserv pdac: It's unclear what the use cases are and what the specific requirements might be.  Unknown User (pdomagala) created a subtask for Fritz Mueller to look into this before proceeding.

  4. Summit-base Tiger Team

    1. Suspended until after the first of the year since Jeff is in Chile.  However, Unknown User (pdomagala) has been included in the North-South IT support meetings.

  5. Infrastructure

    1. Last meeting .  Nothing significant.

  6. North-South IT Support

    1. Unknown User (pdomagala) has been included in the North-South IT support meetings.

Other business


Action Items


From last week