Current Status
NORMAL
Start | End | Event | Location | Description | Systems/services that will NOT be available | Status |
---|---|---|---|---|---|---|
Upcoming Scheduled Maintenance
(All times are Project Time (Pacific))
Start | End | Event | Location | Description | Systems/services that will NOT be available | Status |
---|---|---|---|---|---|---|
06:00 | 12:00 | February lsst-dev maintenance (regular schedule) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) | |
Third Thursday of every month 06:00 | Third Thursday of every month 08:00 | Recurring-Monthly Monthly lsst-dev maintenance | NCSA |
| Variable. Do not expect any lsst-dev system to be available during this period. | SCHEDULED |
Every Mon. 04:00 | Recurring- Weekly | NCSA | Per LSST data management policies, files older than 180 days will be purged from the LSST shared (GPFS) /scratch file system. Purge logs can be found in /gpfs/fs0/admin/purge_logs/scratch/ | No outage or service disruption. | SCHEDULED | |
Every Tu. 08:00 | Every Tu. 10:00 | Recurring- Weekly Weekly Nebula Maintenance | NCSA | Routine system updates. Computational services continue to run. | Horizon and API interfaces. | SCHEDULED |
Last week in Feb. Exact date TBD. | ~6 weeks after start | production-size run (HSC-PDR1) on the verification cluster | NCSA | Per IHS-749, ~15 nodes of the batch compute resources will be reserved in order to complete HSC-PDR1 data runs. It is expected that the reservation can be scaled back to <10 after the first couple of weeks. | All systems available. | SCHEDULED |
Previous Outages & Events
Start | End | Event | Location | Description | Systems/services that will NOT be available | Status |
---|---|---|---|---|---|---|
, 08:00 | , 08:30 | Slurm reconfiguration | NCSA | The slurm scheduler on the verification cluster will be repartitioned from one queue (debug) into two: debug: 3 nodes, MaxTime=30 min normal: 45 nodes, MaxTime=INFINITE | No outages | COMPLETE |
Wed 1/24/2018 13:35 | Wed 1/24/2018 14:55 | Loss of LSST NFS services | NCSA | All NFS mounts for LSST systems were not working | NFS access on lsst-demo and lsst-SUI were not working | RESTORED |
16:40 | 21:00 | Firewall outage | NCSA | Both pfSense firewalls were accidentally powered off. | PDAC (Qserv & SUI) and verification clusters were inaccessible, as well as introducing GPFS issues across many services, e.g. lsst-dev01. | RESTORED |
06:00 | 08:00 | January lsst-dev maintenance (regular schedule) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) | COMPLETE |
06:00 | 11:30 | Critical patches on lsst-dev systems (incl. kernel updates) | NCSA |
| Systems/services that will NOT be available: all lsst-dev systems (incl. lsst-dev01, lsst-xfer, etc. as well as PDAC and the verification cluster) | COMPLETE |
09:00 | 17:00 | Nebula | NCSA | Nebula (OpenStack) will be shut down for hardware and software maintenance from January 2nd, 2018 at 9am until January 5th, 2018 at 5pm. | All Nebula systems unavailable. | COMPLETE |
Saturday | Tuesday | Support over holiday break | NCSA | 2017-12-22 to 2018-1-01 (inclusive) is the University holiday period. Services will be operational. Please report problems via the JIRA IHS queue. The queue will be monitored by NCSA staff and users will be notified via Jira as to if or when their issue can be addressed. | All services will be operational. | COMPLETE |
Wednesday 06:00 | Wednesday 08:00 | NFS Server switch | NCSA | NFS services will be moved to a different host | brief outage of NFS services to SUI nodes, lsst-demo, lsst-demo2 | COMPLETED |
Wednesday 06:00 | Wednesday 07:00 | Firewall drive replacement | NCSA | Current pfSense has a bad drive. If it fails, all nodes behind the firewall will be inaccessible. There are redundant firewalls, no service interrupts are expected. | None Expected | COMPLETED |
Thursday 2017-12-14 04:00 | Thursday 2017-12-14, 19:00 | December lsst-dev maintenance (off-schedule) | NCSA |
| Do not expect any lsst-dev system to be available during this period. | COMPLETED |
Tuesday 2017-11-28, 10:00 | TBD | Rolling reboots of PDAC qserv nodes | NCSA |
| The occasional qserv node will need to be rebooted. Experience with the first couple will allow NCSA to give more precise information on the order and timing of the reboots. | COMPLETED |
2017-11-20 7:00 | 2017-11-20 14:00 | Nebula Openstack cluster | NCSA | Nebula OpenStack cluster will be unavailable for emergency hardware maintenance. A failing RAID controller from one of the storage nodes and a network switch will be replaced. | Not all instances will be impacted. If any running Nebula instances are affected by the outage they will be shut down, then restarted again after we finish maintenance that day. | COMPLETED |
Thursday 2017-11-16 06:00 | Thursday 2017-11-16 10:00 | Extended monthly lsst-dev maintenance | NCSA |
| Do not expect any lsst-dev system to be available during this period. | COMPLETED |
2017-10-31 | NFS instability | NCSA | NFS becomes intermittently unresponsive. | ~STABLE We are guardedly optimistic that this problem has been resolved. PDAC is now utilizing native GPFS mounts. | ||
2017-10-24 09:50 | LSST | GPFS outage | NCSA | All LSST nodes from NCSA 3003 (e.g., lsst-dev01/lsst-dev7) and NCPF (verify-worker, PDAC) that connect to GPFS (as GPFS or NFS) have lost their connection. | GPFS | ONLINE Storage is working to bring GPFS back online |
2017-10-21 17:15 | LSST | public/protected network switch is down in rack N76 at NPCF | nodes cannot communicate DNS, LDAP, etc. so largely cannot communicate with other nodes, e.g., no communication between affected verify-worker nodes and the Slurm scheduler on lsst-dev01, no communication between affected qserv-db nodes and the rest of qserv | Efffectively, the whole verification cluster | RESTORED in progress, replacement switch is on order Workaround in progress. If all goes well, systems should be back online by late afternoon. | |
2017-10-19 06:00 | 2017-10-19 14:00 | qserv-master replacement | NCSA | qserv-master will be down for this entire period | COMPLETE |
DM Meetings and Events
Name | Dates | Location | Notes/links |
---|---|---|---|
JupyterCon 2018 | 2018/08/21-24 | New York City | https://www.oreilly.com/conferences/ Call for speaker: 2018/01 - 2018/02 |
LSST2018 Project & Community Workshop | 2018/08/13–17 | Tucson, AZ | |
2018/06/11–15 | Lyon, France | ||
SPIE 2018 | 2018/06/10-15 | Austin, TX | SPIE Conference in Austin, Meeting: 10 June – 15 June 2018 |
IVOA InterOp Northern Spring | 2018/05/28- 06/01 | CADC, Victoria, BC | |
DMLT face to face | 2018/05/22-24 | SLAC or UW | |
Python in Astronomy 2018 | 2018/04/30- 05/04 | New York, NY | Deadline for applications is December 9th. |
DM Joint Meeting with Systems Engineering | 2018/03/06-08 | IPAC, Pasadena, CA | |
DESC Meeting | 2018/02/05–09 | SLAC | https://confluence.slac.stanford.edu/display/LSSTDESC/February+2018+Collaboration+Meeting+-+SLAC |
Jupyter Widgets Workshop | 2018/01/23-26 | Saclay, France | Developer-centered workshop at CMAP Laboratory at Ecole Polytechnique. Some details at end of this Github thread, or contact Sylvain Corlay sylvain.corlay@gmail.com |
DM Gen. 3 Middleware Meeting | 2018/01/22-25 | Princeton, NJ | Internal DM meeting to further developer SuperTask/Butler designs and do some collaborative development. Agenda and list of attendees are still in progress. |
DM Boot Camp 2 | TBD |
| |
231st AAS Meeting | 2018/01/08–12 | Washington, DC | |
Towards Science in Chile with LSST in Chile | 2017/12/13-15 | Santiago, Chile | https://www.lsst-chile.cl/2017-workshop |