This describes the major traffic types to be carried by the long haul networks and lists the requirements for ensuring delivery.  These are summary descriptions, all the details are contained within the DM Sizing Model, in particular LSE-81 Science Project Requirements and LDM-141 Storage Sizing and IO.  (There are some “stale” areas in those documents in terms of numbers, e.g. compression is not in LSE-81 but is in LDM-141, and neither was updated for the change of 16-bit image data to 18-bit, but those changes do not appreciably affect the bandwidth reservations/fail-over scheme.)

Notable points from prior discussions:
  1. The “Other LSST Operations Traffic" is now grouped with "non-LSST AURA Traffic", since both of these go over the AURA circuits rather than the LSST circuits.
  2. The OCS/TCS/CCS/DMCS traffic was split out from the (DM) Data Backbone traffic.  While both go over the same paths in the network, they are distinct traffic contents and can be prioritized individually. 
  3. According to Kian-Tat Lim,  DM systems should not need to be informed of network outages, although failure returns from send calls are preferred over failure to return (which would necessitate an external timeout). The services will detect failures to accomplish their missions and retry, record, and/or alert as necessary. Telemetry to the OCS is intended to indicate when such failures are occurring; the OCS could use this to disable the DM system if desired (e.g. as part of a transition to "degraded mode"), but this is not required.  The bottom line is that the network does not need to notify theOCS.


There are three numbers with each grouping for network design purposes:
  1. Minimum guaranteed reservation (MGR) - This bandwidth must be guaranteed in all normal and failure scenarios except total network failure, i.e where no paths are up between two connected sites.  The worst case scenario here is the primary LS - SCL link is down, and only the secondary is available at 40 Gbps. Only selected groups have non-0 MGR.
  2. Best effort target allocation (BETA) - This bandwidth is what we would reserve for that traffic grouping if we have at least 100 Gbps connecting all sites.  This partitioning may be accomplished via QoS, prioritization, rather than an actual reservation.
  3. Full capacity target allocation (FCTA) - This bandwidth is what we would reserve for that traffic grouping if we have all paths connecting all sites operating at full capacity.   This partitioning may be accomplished via QoS, prioritization, rather than an actual reservation.
     
South -> North Traffic Groups for QoS/Prioritization (in descending priority order):
  • Science Data Transfer - “Prompt Processing” transfer of crosstalk-corrected science images and associated EFD meta-data within 7 seconds of exposure readout from Summit Site to Archive Site (2 seconds Summit - Base, 5 seconds Base - Archive).   This is the only group that has dedicated channels in the Summit - Base network, and is guaranteed to be transferred in all failure scenarios, except total network failure between two connected sites.  Note, if an outage is short enough (TBD seconds), the DM forwarder attempts to retransmit crosstalk corrected images after outage, otherwise, those images are  over-written and “lost”.  MGR 40 Gbps. BETA 40 Gbps.  FCTA 40 Gbps.  Also note, that there is a small amount of setup and handshaking traffic in both directions on this link to enable the data transfer.
    • Note: The crosstalk correction is expected to be done "on the fly" by the Camera Data System (Camera DAQ). If an outage between Summit and Base is short enough, DM will re-retrieve and retransmit the crosstalk-corrected images. If an outage between Base and Archive is short enough, DM will retransmit over the WAN. If the outage is longer in either case, the original raw images remain in the CDS, but DM is not expecting to retrieve them nor process them as crosstalk-corrected. There is also daytime transfer of raw calibration images as well for prompt assessment, but the volume is lower than night-time.

  • OCS/TCS/CCS/DMCS - This is remote control information, including commands, status, environmental, and ad hoc image information.  The traffic includes output from high quality (HD) in-dome streaming video cameras (at 10fps), and mirroring of Summit Control Room displays at the Base, the Archive, and HQ.  It also includes any transfer of images and diagnostic information to the SLAC camera support location.  It also includes recovery of the Summit EFD instance from the Base after outages.  Other traffic might include accesses from the Summit to the Observatory Operations Data Service at the Base, telemetry information from NCSA systems to be gatewayed to the OCS by the Base DMCS, and Identity Management synchronization between NCSA, Base, and Summit. Remote viewing of NCSA monitoring and management systems could also flow over this link. It's not clear ift the Summit displays are planned to be mirrored at NCSA, but nothing should  prevent it.MGR 0 Gbps.  BETA 20 Gbps. FCTA 40 Gbps.

  • Data Backbone - Transfer of raw, uncorrected images within 24 hours of exposure readout from Summit Site to Archive Site (and catch up of buffered raw images after an outage).  The raw images are also cached at the Base Site.  Transfers of EFD and AUX data for processing/archiving. LSST operations user-initiated file transfers for diagnostic purposes, and results of internal QA-driven data queries. (Note that the DM term “Data Backbone” is the DM software service, i.e. application layer from a Network standpoint, that is layered on the networks and is responsible for these data transfers.)  MGR 0 Gbps.  BETA 20 Gbps. FCTA 40 Gbps.
  • Chilean Traffic - Per the AURA - Chile MOU granting us permission to host the LSST Observatory in Chile, we are required to give access to Chilean astronomers to use the networks when they are available and not in use by LSST.  The implementation details of that agreement remain to be defined. MGR 0 Gbps.  BETA 0 Gbps. FCTA 20 Gbps.
  • Other LSST Operations Traffic, non-LSST AURA Traffic - This is transfer of LSST “routine” operations traffic (web, email, video, VOIP, etc.), and Gemini, CTIO, and any other non-LSST AURA tenant data. In general, there is none of this on LSST circuits, we will share AURA circuits with this traffic.  If AURA circuits fail, there is currently no plan to fail over to LSST circuits for this traffic.  This topic is currently under discussion. MGR 0 Gbps. BETA 0 Gbps. FCTA 0 Gbps.
      
North -> South Traffic  Groups for QoS/Prioritization  (in descending priority order):
  • Calibration Data - Transfer of Calibration images and catalogs, crosstalk-correction matrices, detector characteristics, atmospheric characteristics. MGR 40 Gbps.  This will be managed by the Data Backbone service; DM does not expect this to be configurable at the network level. BETA 50 Gbps. FCTA 50 Gbps.
  • Annual Data Release -  Post-DRP processing transfer of Catalogs and Calibrated Image Files.  DRP results will generally be transferred proactively as they are generated throughout the year, not in a massive burst after the completion of the DRP. This will be managed by the Data Backbone service; DM does not expect this to be configurable at the network level. MGR 0 Gbps.  BETA 50 Gbps. FCTA 50 Gbps.
    • Note: At present while a customized release also goes to LSST EPO, hosted in the cloud, this data is not expected to go over the LSST LHN system.

Transatlantic Traffic (email thread with Kian-Tat Lim)

From: Jeff Kantor <JKantor@lsst.org>

Subject: [LSST|lsst-net #352] Fwd: DRP sizing

Date: March 24, 2020 at 8:01:09 AM MST

To: "lsst-net@lists.lsst.org" <lsst-net@lists.lsst.org>

Cc: Kian-Tat Lim <ktl@slac.stanford.edu>, Michelle Butler <mbutler@ncsa.uiuc.edu>


Hello all,


I had the action to better characterize the data flows between NCSA and IN2P3.  I enlisted the help of KT Lim from DM and here is what we determined, derived from the latest sizing model.  My back of the envelope calculation, using numbers in KJ-T’s message below, and assuming that a 200 day processing time implies 200 days to transfer the data as well, (ignoring faster PVIs for now), is that this represents average daily volumes for transfers:

NCSA to IN2P3

LSST Operations Year 1: (4816 + 3864 + 6743 + 3987)/200 = ~100 TB/day

LSST Operations Years 2 - n increases by: (Yr-1)*(6743 + 3987)/200 = ~50 TB/day, so

Year 2 ~150TB/day, Year 2 ~200 TB/day, …Year 10 ~600 TB/day

Year 10 600 TB/day = 4.8 Pb/day or 25 TB/hr = 7 GB/sec = 56 Gb/s

So, the network can start at 10 Gb/s in Year 1, but needs to grow approximately 5 Gb/s per year

Even accounting for 120 day processing period and faster PVIs it appears we will be significantly more than 10 Gb/s and significantly less than 100 Gb/s by Year 10

IN2P3 to NCSA

This is less by 4816/200 = ~25 TB/day, so also between 10 - 100 Gb/s


KT Comment: The raws can actually go a bit slower (over the whole year), but that's more than offset by the other data having to go faster, as they won't all show up immediately and need to be there well before the end.


Cheers!


Jeff


Begin forwarded message:

From: Kian-Tat Lim <ktl@slac.stanford.edu>

Subject: Re: DRP sizing

Date: March 18, 2020 at 4:37:37 PM MST

To: Jeff Kantor <JKantor@lsst.org>


Jeff,


Apologies for the delay.  Here are the primary DRP data products

that need to flow each year between the LDF and CC-IN2P3.  These numbers

are taken from DMTN-135, with factors of 1/2 in appropriate places.


Raw images: 4816 TB to CC-IN2P3

Coadd images: 3864 TB to CC-IN2P3, 3864 TB from CC-IN2P3

PVIs (lossless-compressed): 6743 TB/year to CC-IN2P3, 6743 TB/year from CC-IN2P3

(6743 in LOY1, 13485 in LOY2, 20229 in LOY3, etc.)

Catalogs: 3987 TB/year to CC-IN2P3, 3987 TB/year from CC-IN2P3

(3987 in LOY1, 7973 in LOY2, 11961 in LOY3, etc.)


It may be more practical to generate all the PVIs at both sites, rather

than transferring them.


I would expect the bidirectional numbers to increase about 10-20% over

these nominal figures to handle overlapping spatial regions, consistency

checks, etc.


I presume that CC-IN2P3 will load any database they have, whether Qserv

or not, locally, rather than transferring that data over the network.


Generally we are assuming 200 days to run a DRP.  This means that some

of these datasets need to be transferred in significantly less time.

Having sufficient bandwidth for transferring all the PVIs in, e.g., 50

days would be advantageous.  (14 days would be even better but may not

be feasible.)  If we are not transferring PVIs, then transferring all

the catalogs in, say, 120 days might be a reasonable goal.

-- 

Kian-Tat Lim, Rubin Observatory/LSST Data Management, ktl@slac.stanford.edu


_______________________________________________

LSST-Net mailing list

LSST-Net@lists.lsst.org

https://lists.lsst.org/mailman/listinfo/lsst-net


  • No labels