Skip to end of metadata
Go to start of metadata

Date:   

Attendees: Unknown User (pdomagala)Margaret GelmanDonald Petravick

NOTE: statistics cover the past 2 weeks

Scheduled Maintenance

See the LSST Service Status Page

  • Note that the December maintenance window is moved from 12/21 to 12/14.
    • Nebula will not be affected. The L1 test stand would have had OS upgrades early Thurs. morning, but I've asked the admin's to hold off until Jan.  in order to provide maximum stability during the early integration excercise.
  • 2017-12-22 to 2018-1-01 (inclusive) is a University holiday period. Services will be operational. NCSA will respond to incidents based on business criticality.  During the holidays, incidents should be submitted as JIRA IHS tickets as we will not be monitoring slack channels as much as normal.


As-is Services

Incidents

  • 1 created, 1 resolved.

Created & Resolved:

Discussion of Notable Issues

      • Unexpected reboot of lsst-qserv-db16 (IHS-606).

        Firmware upgrades were successful and all nodes were returned to service Thur. 11/30.  Since then, there have been no unplanned reboots.

Requests

    • 4 created, 3 resolved

Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Changes proceed through 5 stages: 

1

Initial Assessment
Check that the submitter has stated a plausible business case and the relevant T/CAM agrees
2Feasibility AssessmentIs the change well-formulated, address a project need and cost-effective.
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4Implementation
The plan is executed to implement the change.
5AssessmentVerification of successful change & issues analysis
6ClosedDocumentation and formally close the request. close-out.

Open Change Requests

Key  SummaryProcess Stage†ReporterPCreatedStatus
IHS-576

Configure slurm to accept jobs to use only partial nodes

Planning

Tim MortonMajor02/Nov/17This change has been approved and is tentatively scheduled for early CY18.
IHS-580

DM developers need a build/test environment that supports docker containers

Planning

Joshua HoblittMinor02/Nov/17

Use case and requirements are being gathered. Unknown User (pdomagala) will discuss this with the PDAC working group this Thursday.

IHS-612

Implement debug and normal queues for developers on the verification cluster

Planning

Yusra AlSayyadMajor16/Nov/17Currently being planned and tested. Tentatively scheduled for deployment before 22/Dec/17
IHS-638

Install a tex distro on lsst-dev

Closed

Merlin Fisher-LevineMinor04/Dec/17Approved by M. Butler and completed on 08/Dec/17

Heard on the Street This Week, but no Ticket Filed

  • New

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

  • Previous

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Change Process Notes

  • Change process is being refined based on experience and feedback from exercising  it over the past month.

Problem Management

Report format under development

Interactions

  1. T/CAM interactions

    1. Protracted discussion with John SwinbankKian-Tat LimSimon Krughoff re. the need to have retrievability of items purged from the GPFS /scratch partition for some short period of time.  See  IHS-613 - Getting issue details... STATUS

  1. ITSC

    1. Unknown User (jmatt) proposes in ITRFC-10 that the ITSC consider making recommendations for container and container orchestration best practices. 

      Next meeting  


  1. PDAC

    1. Next PDAC meeting   

  2. Summit-base Tiger Team

    1. Suspended until after the first of the year since Jeff is in Chile.  However, I’m linked in to Chile IT.


  3. Infrastructure

    1. The  meeting was focused on the upcoming  maintenance event.

Other business

Proposed that we install a standard Influx/telegraf/prometheus stack on the standard Nebula images.  Install a monitoring system in openstack to serve up the data/dashboards.

Action Items

New

From last week