...
Current Status
Status colour YellowGreen title MaintenanceNormal
Start | Event | Location | Description | Systems/services that will NOT be available | Status | 06:00 | Critical patches on lsst-dev systems (incl. kernel updates) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) |
---|
Status | ||||||
---|---|---|---|---|---|---|
|
Request Support
Upcoming Scheduled Maintenance
...
Start | End | Event | Location | Description | Systems/services that will NOT be available | Status | 06:00||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10:00 | Critical patches on lsst-dev systems (incl. kernel updates) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) |
| 06:00 | 06:00 | January lsst-dev maintenance (regular schedule) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) |
| ||||||||||||||||
Third Thursday of every month 06:00 | Third Thursday of every month 08:00 | Recurring-Monthly Monthly lsst-dev maintenance | NCSA |
| Variable. Do not expect any lsst-dev system to be available during this period. |
| ||||||||||||||||||||||
Every Mon. 04:00 | Recurring- Weekly Purge of GPFS /scratch partition | NCSA | Per LSST data management policies, files older than 180 days will be purged from the LSST shared (GPFS) /scratch file system. Purge logs can be found in /gpfs/fs0/admin/purge_logs/scratch/ | No outage or service disruption. |
| |||||||||||||||||||||||
Every Tu. 08:00 | Every Tu. 10:00 | Recurring- Weekly Weekly Nebula Maintenance | NCSA | Routine system updates. Computational services continue to run. | Horizon and API interfaces. |
| ||||||||||||||||||||||
...
Start | End | Event | Location | Planned Activities | Systems/services that will NOT be available | Status | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
06:00 | 11:30 | Critical patches on lsst-dev systems (incl. kernel updates) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) |
| ||||||||||||||
09:00 | 17:00 | Nebula | NCSA | Nebula (OpenStack) will be shut down for hardware and software maintenance from January 2nd, 2018 at 9am until January 5th, 2018 at 5pm. | All Nebula systems unavailable. |
| ||||||||||||||
Saturday | Tuesday | Support over holiday break | NCSA | 2017-12-22 to 2018-1-01 (inclusive) is the University holiday period. Services will be operational. Please report problems via the JIRA IHS queue. The queue will be monitored by NCSA staff and users will be notified via Jira as to if or when their issue can be addressed. | All services will be operational. |
| ||||||||||||||
Wednesday 06:00 | Wednesday 08:00 | NFS Server switch | NCSA | NFS services will be moved to a different host | brief outage of NFS services to SUI nodes, lsst-demo, lsst-demo2 |
| ||||||||||||||
Wednesday 06:00 | Wednesday 07:00 | Firewall drive replacement | NCSA | Current pfSense has a bad drive. If it fails, all nodes behind the firewall will be inaccessible. There are redundant firewalls, no service interrupts are expected. | None Expected |
| ||||||||||||||
Thursday 2017-12-14 04:00 | Thursday 2017-12-14, 19:00 | December lsst-dev maintenance (off-schedule) | NCSA |
| Do not expect any lsst-dev system to be available during this period. |
| ||||||||||||||
Tuesday 2017-11-28, 10:00 | TBD | Rolling reboots of PDAC qserv nodes | NCSA |
| The occasional qserv node will need to be rebooted. Experience with the first couple will allow NCSA to give more precise information on the order and timing of the reboots. |
| ||||||||||||||
2017-11-20 7:00 | 2017-11-20 14:00 | Nebula Openstack cluster | NCSA | Nebula OpenStack cluster will be unavailable for emergency hardware maintenance. A failing RAID controller from one of the storage nodes and a network switch will be replaced. | Not all instances will be impacted. If any running Nebula instances are affected by the outage they will be shut down, then restarted again after we finish maintenance that day. |
| ||||||||||||||
Thursday 2017-11-16 06:00 | Thursday 2017-11-16 10:00 | Extended monthly lsst-dev maintenance | NCSA |
| Do not expect any lsst-dev system to be available during this period. |
| ||||||||||||||
2017-10-31 | NFS instability | NCSA | NFS becomes intermittently unresponsive. |
We are guardedly optimistic that this problem has been resolved. PDAC is now utilizing native GPFS mounts. | ||||||||||||||||
2017-10-24 09:50 | LSST | GPFS outage | NCSA | All LSST nodes from NCSA 3003 (e.g., lsst-dev01/lsst-dev7) and NCPF (verify-worker, PDAC) that connect to GPFS (as GPFS or NFS) have lost their connection. | GPFS |
Storage is working to bring GPFS back online | ||||||||||||||
2017-10-21 17:15 | LSST | public/protected network switch is down in rack N76 at NPCF | nodes cannot communicate DNS, LDAP, etc. so largely cannot communicate with other nodes, e.g., no communication between affected verify-worker nodes and the Slurm scheduler on lsst-dev01, no communication between affected qserv-db nodes and the rest of qserv | Efffectively, the whole verification cluster |
in progress, replacement switch is on order Workaround in progress. If all goes well, systems should be back online by late afternoon. | |||||||||||||||
2017-10-19 06:00 | 2017-10-19 14:00 | qserv-master replacement | NCSA | qserve-master will be down so that systems engineering can finish configuring the new server and xfering files. Status updates here:
| qserv-master will be down for this entire period |
|
...