You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 61 Next »

Current Status

NORMAL


Report a problem

Upcoming Scheduled Maintenance

(All times are Project Time (Pacific))

StartEndEventLocation

Planned Activities

Systems/services that will NOT be availableStatus
Every Tu. 08:00Every Tu. 10:00

Recurring

Weekly Nebula Maintenance

NCSARoutine system updates. Computational services continue to run.Horizon and API interfaces.-

Third Thursday of every month 06:00

Third Thursday of every month 08:00

Recurring

Monthly lsst-dev maintenance

NCSA
  • Routine system updates.
  • Note that the July 20 maintenance event will include permanently unmounting the remaining NFS filesystems.
Variable. Do not expect any lsst-dev system to be available during this period.-

Previous Outages & Events

StartEndEventLocationPlanned ActivitiesOutcome

Aug. 24, 06:00

Aug. 24, 13:30

LSST Dev infrastructure upgrades

NCSA
  • Infrastructure upgrades to all LSST-dev resources:

    • lsst-dev01
    • lsst-db
    • lsst-xfer
    • lsst-web (lsst7)
    • lsst-dts
    • lsst-dbdev environment
    • lsst-daq test stand
    • lsst-dbb (data backbone) test stand
    • lsst-elast (elastic compute) test stand

Completed successfully


Aug. 24, 06:00

Aug. 24, 07:30

LSST Dev patching

NCSA
  • Regular maintenance (Puppet releases and patching) of NPCF-based systems
  • IHS-416

Completed. Note the separate status message for "LSST Dev infrastructure upgrades", which includes system in NCSA 3003 and is scheduled for 06:00 - 15:00.

Maintenance on the following:

  • Cluster Services
  • Verification cluster
  • Prototype DAC
  • effected systems include: adm01, backup01, bastion01, monitor01, object*, qserv*, sui*, verify-worker*, test0*

Aug. 22, 06:00


decommissioning of lsst-dbdev machines

NCSA
  • The lsst-dbdev nodes are old, obsolete and unused. They'll be powered down, but kept for a few months in case a need for the data on them arises.

Done.


2017-07-20 04:00

2017-07-20 08:00

Monthly lsst-dev maintenance

NCSA
  • Routine system updates.
  • Note that this event will begin 2 hrs. earlier than normal
  • Note that the July 20 maintenance event will include permanently unmounting the remaining NFS filesystems.
  • IHS-365

See IHS-365 - Getting issue details... STATUS for details

verify-worker31 suffered a failure and will be out of commission for a while
2017-06-22 (06:00)2017-06-22 (10:00)

Critical Kernel upgrades

NCSA

Upgrade kernel and system packages to address Stack Guard Page vulnerability. See also: IHS-324 - Getting issue details... STATUS

All NCSA hosted resources (except Nebula).

UPDATE: 08:00 PT - Outage is being extended till 10:00 PT.

Outage was completed at 10PT. Some nodes didn't come back. See ticket for details.


2017-06-15 06002017-06-15 0730

Deploy unbound LSST cluster nodes (verify-worker*, qserv*, sui*, bastion01, test*, backup01)

NCSADNS resolving may have a short (~30 mins) delay.Updates deployed successfully via new puppet module. All tests passed.

2017-06-04

2017-06-05DAQ installationNCSA

Mike Huffer postponed to a later date.

Pushed back - yet to be rescheduled

2017-05-18 (06:00)

2017-05-18 (08:00)

LSST monthly maintenance

NCSA
  • Kernel upgrades and reboots in the LSST dev environment
  • Permanently unmount the old NFS home filesystem. This completes the decommissioning of the NFS home filesystem.
  • Install of Unbound local caching resolver software as recommended by NCSA Security
  • Remount of remaining NFS exports in read-only mode. Users should migrate any old data off of NFS in preparation for the final NFS decommissioning. NFS is expected to be turned off entirely on July 20, 2017.
  • Upgrade to latest MySQL 5.5.56 on lsst-db.ncsa.illinois.edu.

Success.


2017-05-04 09:302017-05-04 10:00

Unplanned

lsst-dev file systems full

NCSA

lsst-dev

lsst-dev filesystems / and /home filled up at approximately 09:30. This was a result of inode usage from another process

The admins freed up inodes to make the filesystem responsive again.

Admins are currently tracking down the root cause.


2017-04-27 13:112017-04-27 14:20

Unplanned

Nebula outage

glusterfs crashed due to this bug, so no instances could access their filesystemsAll instances running on NebulaNeeded to reboot the node that systems were mounting from, but took the opportunity to upgrade all gluster clients on other systems while waiting for a reboot. Version 3.10.1 fixes the bug. All instances with errors in their logs were restarted.

2017-04-20 (04:30)

2017-04-20 (09:30)

LSST monthly maintenance

NCSA

This event is cancelled so as not to interfere with Early Integration Activity #03 being held at NCSA April 19 & 20.

nothing bad happened

2017-04-17 (13:41)

2017-04-17 (13:53)

Unplanned

lsst-dev login node down

NCSA

Users unable to log in to lsst-dev.

Probable cause is that the root file system filled up due to excessive logging

Fixed
2017-03-27 (22:00)2017-03-29 (14:00)Blue Waters maintenanceNCSA

Due to maintenance of cooling infrastructure at NPCF, Blue Waters will down during this period. Cray will also take this maintenance window to perform some system updates at the same time.

Systems that will be down

  • Slurm cluster compute nodes will be powered down for the duration of the outage.

Systems that will remain up

Qserv nodes ( lsst-qserv-* ), SUI nodes ( lsst-sui-* ), Bastion node ( lsst-bastion01 ) should remain online during the outage.  

However, if temperatures in the NPCF rise too high, we will be forced to shut these down as well. I've been told that this is a low-probability scenario and we will be given time to do graceful shutdowns. In the unlikely event that this happens, it will be communicated through the DM Slack channel and also posted here.

All systems normal


2017-03-23 (0800)

2017-03-23 (1300)

NCSA Nebula OutageNCSANebula will take an outage to balance and build a more stable setup for the file system. This will require a pause of all instances, and Horizon being unavailable.

Nebula is back to normal.


2017-03-16 (0430)2017-03-16 (0930)LSST monthly maintenanceNCSAGPFS filesystems will go offline for entire duration of outages. Some systems may be rebooted, especially those that mount one or more of the GPFS filesystems.

2017-02-22 14152017-02-22 (1615)Nebula Gluster IssuesNCSAAll Nebula instances paused while gluster repairedNebula is available.


Important Project Dates

(those with asterisk* are LSSTC funded):

2017

 

April 24-28

Data Science Fellowship Program – Session 3 * Tucson, AZ

May 1 – 3

NSF Large Facilities Workshop, Baton Rouge and Livingston, LA

May 1 – 5

AURA Board and Member Representatives Annual Meeting, Tucson, AZ

May 12 - 13

LSST Detection of Optical Counterparts of Gravitational Waves*, BNL.  Contact Morgan May for additional information.

May 22 – 25

Infrastructure for Time Domain Science in the Era of LSST, Tucson, AZ

May 31 - June 2

Supernovae:  The LSST Revolution Workshop *, Northwestern University, Evanston, IL

June 12 – 16

Getting Ready for Doing Science with LSST Data,* IN2P3, Lyon, France

June 19 – 21

AURA Workforce and Diversity Committee (WDC), Maui, HI

July 10 - 14

DESC Meeting, Dark Energy School, and Hack Day*, jointly hosted by Stony Brook University & BNL

July 25 – 27

NSF/DOE Joint Status Review of Data Management, NCSA, IL

August 14 – 18

LSST 2017 Project & Community Workshop, Tucson, AZ

September 6 – 8

NSF/DOE Joint Status Review, Tucson, AZ

September 14 – 15

AURA Management Council for LSST (AMCL) Meeting, Tucson, AZ

October 26 – 28

Society of Women Engineers WE17 Conference, Austin, TX,

Get your LSST gear at our storefront: https://business.landsend.com/store/lsst/ 




  • No labels