Date:   

Attendees: Unknown User (pdomagala)Margaret GelmanDonald Petravick


Scheduled Maintenance

See the LSST Service Status Page



As-is Services

Incidents

KeyShort Description


ReporterCreatedResolvedSummary
IHS-782

My work computer cannot ssh into lsst-dev via wired ethernet

Russell Owen

06/Feb/18

06/Feb/18

User cannot log in to lsst-dev via wired connection at UW.

The user's IP was blocked by NCSA Security because it was detected attempting brute force ssh scans. The IP was in use by another device at the time. The block has since expired and the incident reported to UW security. Culprit was someone else with a raspi


Requests


KeyShort Description


ReporterCreatedResolvedSummary
IHS-783

Can't login to NCSA account (lsst-dev)

Patrick Ingraham06/Feb/1806/Feb/18Bad entry in his .ssh/known_hosts file. Fixed.
IHS-767

Make per-user GPFS usage data available to users

Paul Domagala01/Feb/18 Done.
IHS-760

Please mount GPFS /project on lsst-demo

Gregory Dubois-Felsmann30/Jan/1831/Jan/18

Request to make /project available on the lsst-demo server, in support of the Firefly server on that host.

/project is now mounted readonly on lsst-demo from GPFS via NFS.

IHS-755

Addition of new data in /datasets/hsc

Hsin-Fang Chiang29/Jan/1801/Feb/18

Per RFC-440, create directory structure for new calibration data into /datasets/hsc.

The following directories were created:

/datasets/hsc/repo/transmission/
/datasets/hsc/calib/20180117/

IHS-714

Update docs on verification cluster

Simon Krughoff17/Jan/1830/Jan/18
IHS-618

Greg Daues need privileges to manage certain GPFS filesets

Paul Domagala20/Nov/1706/Feb/18It was determined that the sysadmins will take care of these requests.




Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Successful changes proceed through 5 stages: 

1

Business Case & T/CAM ConcurrenceCheck that the submitter has stated a plausible business case and the relevant T/CAM agrees
2FeasibilityIs the change well-formulated, address a project need and
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4InsertionThe plan is executed to implement the change.
5AssessmentVerification of successful change, issues analysis, documentation and close-out.


Open Change Requests


Key  SummaryProcess Stage†ReporterPCreatedStatus
IHS-766

The members of the Infrastructure Working group have asked that NCSA set a 1 TB/user quota for home directories

FeasibilityPaul DomagalaMajor01/Feb/18

A complementary RFC has been filed: RFC-443, Re-enable per user quotas in home directories

NCSA has yet to discuss this.

IHS-580

DM developers need a build/test environment that supports docker containers

Feasibility

Joshua HoblittMinor02/Nov/17Determining exact needs and if current capabilities are adequate
IHS-576

Configure slurm to accept jobs to use only partial nodes

Planning

Tim MortonMajor02/Nov/17Assessing the impact to other use cases
IHS-488

Increase limits and swap space for qserv pdac

Feasibility


John Gates

04/Oct/17

Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback.

Heard on the Street This Week, but no Ticket Filed

  • New


  • Previous

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Change Process Notes

  • Paul has contacted corresponding T/CAMs to understand business need, obtain concurrence & document in each LDMCR
  • Change process is being exercised, refined and socialized with T/CAMs as well as submitters

Problem Management

A problem registry has been begun here to analyze incidents in an effort to identify root cause, frequency and severity.

Interactions

  1. T/CAM interactions


    ITSC

    • Nothing new


  1. PDAC

    • Last PDAC meeting 11/16/2017. 


  1. Summit-base Tiger Team

    • This meeting will resume on 


  2. Infrastructure

  • Last meeting 
    • Shared storage usage was the major topic of discussion
    • Daily usage statistics are now generated for filesets /home, /project, /datasets, /scratch
    • The Infrastructure group was briefed on an upcoming reservation in the batch computing environment to support processing of HSC-PDR1 data.


  • Next meeting  

Action Items


From last week