Skip to end of metadata
Go to start of metadata

Scheduled Maintenance

Next lsst-dev maintenance event is Thurs. Feb. 14.  Notable is that fact that fixes for the spectre and meltdown vulnerabilities may be tested and released (and not retracted) by then.

See the LSST Service Status Page for details

As-is Services

(since 12/20/2017)

Incidents

  • 6 created, 5 resolved.

Created: 

Resolved:

Discussion of Notable Issues

With regard to IHS-703, root cause was determined to be network congestion. It has been addressed by network tuning, but architectural changes are warranted and will be discussed internally. 

Requests

  • 6 created, 3 resolved

Created:

Resolved:

Change Management

This process primarily targets requests that can be handled with current level of effort (LOE) resources.  This process is also designed to detect and redirect items to the EVMS process if they exceed LOE resources.

Successful changes proceed through 5 stages: 

1

Business Case & T/CAM ConcurrenceCheck that the submitter has stated a plausible business case and the relevant T/CAM agrees
2FeasibilityIs the change well-formulated, address a project need and
3PlanningA detailed implementation plan is created which takes into account impacts, resource needs, testing and verification.
4InsertionThe plan is executed to implement the change.
5AssessmentVerification of successful change, issues analysis, documentation and close-out.


Open Change Requests

Key  SummaryProcess Stage†ReporterPCreatedStatus
IHS-612

Implement debug and normal queues for developers on the verification cluster

Planning

Yusra AlSayyadMajor16/Nov/17Anticipate implementation this week
IHS-580

DM developers need a build/test environment that supports docker containers

Feasibility

Joshua HoblittMinor02/Nov/17Determining exact needs and if current capabilities are adequate
IHS-576

Configure slurm to accept jobs to use only partial nodes

Planning

Tim MortonMajor02/Nov/17Assessing the impact to other use cases
IHS-488

Increase limits and swap space for qserv pdac

Feasibility


John Gates

04/Oct/17

Discussion in several infrastructure & PDAC meetings. Fritz Mueller has the action item of needs-gathering. Waiting for feedback.

Heard on the Street This Week, but no Ticket Filed

  • New

    • None

  • Previous

    • It was suggested that per-user storage usage for each shared fileset be made available.  Preferably readable by any DM member.

    • Several users expressed a desire to have the Intel compiler suite (icc) available on last-dev
    • Increase ssh idle session timeout, which is currently 1 hr. (John Parejko via Slack) 
    • Suggestion to deploy kubernetes on PDAC, it is assumed that this is being handled through the rolling-wave (EVMS) process
    • Tools for parallel programming in batch computing environment (gnu parallel and others)

Change Process Notes

  • Change process is being further developed
  • Change process is being exercised, refined and socialized with T/CAMs as well as submitters

Problem Management

Report format under development

Interactions

  1. T/CAM interactions

    Mainly Slack discussions regarding infrastructure operational details (e.g., patching, system status, etc)

    ITSC

    1. The following RFCs have been adopted:

    2. Notable is 

      which is also a project (EVMS) deliverable.  It's unclear how ITSC and project objectives are related, but there has been no movement on this within the ITSC.

    3. Next meeting  


  1. PDAC

    1. All systems operational.  The team continues work on milestone testing for the F17 "load and serve the WISE single-epoch data" effort.

    2. Next PDAC meeting  

  2. Summit-base Tiger Team

    1. Still suspended as implementation has taken precedence over planning. Pinged Jeff Kantor yesterday on this and he'll be rebooting meetings

  3. Infrastructure

    1. Patches to user-facing lsst-dev servers for spectre and meltdown vulnerabilities were applied. The assessment of the sysadmin & security teams is that remaining systems are not in any immediate peril. They will wait for patches that are tested and supplied through official vendor channels.

    2. Next meeting  

Other business

(None)

Action Items

New


From last week