Date:   

Attendees: Unknown User (pdomagala)Margaret GelmanDonald Petravick

Scheduled Maintenance

See the LSST Service Status Page

Next lsst-dev maintenance event 

As-is Services

Incidents

Created & Resolved: 

KeyShort Description


SummaryStatus
IHS-748

"Stale file handle" on lsst-dev01

GPFS fileserver box CPU overloaded due to vendor cron jobs run amok. Fixed in < 1hr. The fix is permanent.

RESOLVED

Resolved:

KeyShort Description


Summary
IHS-748

"Stale file handle" on lsst-dev01

GPFS fileserver box CPU overloaded due to vendor cron jobs run amok. Fixed in < 1hr. The fix is permanent.

Requests

Created:

KeyShort Description


SummaryStatus
IHS-749

Batch Compute Production-Size Runs

Need for dedicated batch compute capacity for upcoming data production runs. Implementation of multiple queues in the batch compute environment will satisfy the need (IHS-612).

IN PROG.


IHS-752

LSST accounts needed

Provisione accounts/access for new employee Kimberly Blum: lsst-dev, lsst-dev-db, Nebula

DONE

IHS-755

Addition of new data in /datasets/hsc

Hsin-Fang Chiang added new calibration data per RFC-440

DONE

IHS-760

Please mount GPFS /project on lsst-demo

Firefly on the lsst-demo server needs access to /project files.

DONE

Resolved:

KeyShort Description


Summary
IHS-714

Update docs on verification cluster

Removed numerous and confusing references to support email addresses and replaced with 2 persistent links to Jira IHS ticket creation:

Create a Request

Report an Incident

IHS-717

Openstack Installation / Configuration

Andres Villalobos (sysadmin, LaSerena) requested assistance setting up an OpenStack cluster in Chile. We have arranged for a student with expertise in this area, Antonio Abinader, to help him.
IHS-720

Setup (or reset) Nebula Account for John Gates

Done

IHS-752

LSST accounts needed

Done

IHS-755

Addition of new data in /datasets/hsc

Done

IHS-760

Please mount GPFS /project on lsst-demo

Mounted NFS readonly for rile system security reasons.

Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Successful changes proceed through 5 stages: 

1

Business Case & T/CAM ConcurrenceCheck that the submitter has stated a plausible business case and the relevant T/CAM agrees
2FeasibilityIs the change well-formulated, address a project need and
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4InsertionThe plan is executed to implement the change.
5AssessmentVerification of successful change, issues analysis, documentation and close-out.


Open Change Requests

Key  SummaryProcess Stage†ReporterPCreatedStatus
IHS-612

Implement debug and normal queues for developers on the verification cluster

Complete

Yusra AlSayyadMajor16/Nov/17

DONE

IHS-580

DM developers need a build/test environment that supports docker containers

Feasibility

Joshua HoblittMinor02/Nov/17Determining exact needs and if current capabilities are adequate
IHS-576

Configure slurm to accept jobs to use only partial nodes

Planning

Tim MortonMajor02/Nov/17Assessing the impact to other use cases
IHS-488

Increase limits and swap space for qserv pdac

Feasibility


John Gates

04/Oct/17

Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback.

Heard on the Street This Week, but no Ticket Filed

  • New

    • Gregory expressed a need for additional PDAC compute capacity to support Science Platform development.  When he finishes the needs assessment, he'll file a request.

  • Previous

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Problem Management

Report format under development

Interactions

T/CAM interactions


ITSC

  • Next meeting  
  • I've filed  ITRFC-11 Propose Changes to the LSST account/access request instructions to bring the instructions in line with our process.

PDAC

  • Last PDAC meeting 11/16/2017.
  • Only item of note is the possibility of a request for additional PDAC compute capacity.

Summit-base Tiger Team

  • This working group has been rebooted.  Next meeting 

Infrastructure

  • No significant outstanding topics
  • Next meeting 
    • Will brief the group on changes to the doc's and the incident, request and change process

Other business

(None)

Tasks



  • No labels