Virtual NOC: VNOC requirements (refer to LSST NET Meeting report section on VNOC) Three aspects of VNOC information sharing Maintenance notifications - Vincenzo Capone lead information collection/documentation Fault detection/troubleshooting/recovery notifications - Albert Astudillo lead information collection/documentation Performance monitoring - Julio Ibarra lead information collection/documentation Two levels of notification/distribution: 1. within VNOC (network operators) THIS IS OUR CURRENT FOCUS 2. LSST operations centers (interpret above for LSST ops impacts) THIS NEEDS TO BE DEFINED WITH LSST OPS PEOPLE Notifications for maintenance, outages, recovery Email not reliable, need automation for scaling IETF draft standard https://tools.ietf.org/html/draft-gunter-calext-maintenance-notifications-00 AA, JI good draft, could use it, but let's see what operators currently do. JK - If we standardize on this format, we will still need conventions for ID so that they can be understood by all (e.g. Object-ID) RL - Someone will have to interpret what these messages, how do we get operators to produce notifications in this format? AA - Open to doing, but must get agreement from operations VC - GEANT uses internal system for ticketing/notification system, little possibility to change Performance Monitoring JP - PerfSonar is our common denominator. Produces transport capacity and latency. Need to see what else is exportable. Most data we collect and send out will be SNMP or perfSonar, maybe net flow, others. NETSage project is working on this. MK - PerfSonar nodes need to be on same subnets/VLANs, dedicated to avoid noise from other traffic, applications. RL - Need 100G NICs to support this. Notification speed RL - When fault/trouble, we need to know where problem is, NOCs know this but it takes a long time to find out. Need to speed this up for VNOC. AA - REUNA sends initial message within 5 - 10 minutes after detection, then every 30 minutes until resolved JI - This is complex space to quantify, with submarine cables it can be a while to report. Varies by service provider, easier where we have more control of network infrastructure. |