perfSONAR Hosts description
Each host has three interfaces:
- One 10G interface for BWCTL
- One 10G interface for latency and loss (OWAMP)
- One 10G or 1G interface for management
Login information
City | Node | Mgmt IP | Throughput | Loss |
---|---|---|---|---|
Miami (AmLight) | ps-mia.ampath.net | 190.103.184.93 (p1p2.4003) | p1p2 | em3 |
Huechuraba (AmLight) | ps-scl.sdn.amlight.net | 190.103.184.113 (em1.4002) | p2p2 | p2p1 |
São Paulo (AmLight) | ps-spo.sdn.amlight.net | 200.136.41.86 (enp7s0f1) | enp11s0f0 | enp11s0f1 |
Panamá City (AmLight) | ps-pty.sdn.amlight.net | 200.0.207.47 (em1.7) | p1p2 | em1 |
La Serena (AURA) | ps-aura.sdn.amlight.net | 139.229.127.104 (em1) | p1p2 | p2p2 |
La Serena (LSST) | perfsonar1.ls.lsst.org | 139.229.135.17 (em1) | p2p1 | p2p2 |
Champaign (NCSA) | ps-ncsa | 141.142.129.31 (em1) | em2 | p2p1 |
Cerro Pachón (LSST) | TBD | TBD | TBD | TBD |
Santiago (REUNA) | TBD | TBD | TBD | TBD |
High-level Perfsonar Deployment Diagram
Configured Tests
Test | Type | Path BW | Throughput | Loss | ||
---|---|---|---|---|---|---|
VLAN | IP | VLAN | IP | |||
MIA x PTY | Per Path | 100G | 706 | 10.7.6.0/24 | 707 | 10.7.7.0/24 |
fd95:26fb:39ab:0706::/64 | fd95:26fb:39ab:0707::/64 | |||||
MIA x Huechuraba | Per Link | 10G | 700 | 10.7.0.0/24 | 701 | 10.7.1.0/24 |
fd95:26fb:39ab:0700::/64 | fd95:26fb:39ab:0701::/64 | |||||
MIA x Huechuraba | Per Path | 100G (Atlantic) | 718 | 10.7.18.0/24 | 719 | 10.7.19.0/24 |
fd95:26fb:39ab:0718::/64 | fd95:26fb:39ab:0719::/64 | |||||
MIA x Huechuraba | Per Path | 100G (Pacific) | 716 | 10.7.16.0/24 | 717 | 10.7.17.0/24 |
fd95:26fb:39ab:0716::/64 | fd95:26fb:39ab:0717::/64 | |||||
MIA x SPO | Per Path | 100G (Atlantic) | 708 | 10.7.8.0/24 | 709 | 10.7.9.0/24 |
fd95:26fb:39ab:0708::/64 | fd95:26fb:39ab:0709::/64 | |||||
MIA x SPO | Per Path | 100G (Monet) | 724 | 10.7.24.0/24 | 725 | 10.7.25.0/24 |
fd95:26fb:39ab:0724::/64 | fd95:26fb:39ab:0725::/64 | |||||
PTY x Huechuraba | Per Link | 100G | 702 | 10.7.2.0/24 | 703 | 10.7.3.0/24 |
fd95:26fb:39ab:0702::/64 | fd95:26fb:39ab:0703::/64 | |||||
PTY x Huechuraba | Per Link | 100G | 702 | 10.7.2.0/24 | 703 | 10.7.3.0/24 |
fd95:26fb:39ab:0702::/64 | fd95:26fb:39ab:0703::/64 | |||||
Huechuraba x SPO | Per Link | 100G | 704 | 10.7.4.0/24 | 705 | 10.7.5.0/24 |
fd95:26fb:39ab:0704::/64 | fd95:26fb:39ab:0705::/64 | |||||
Huechuraba x LSC (LSST) | Per Path | 100G | 726 | 10.7.26.0/24 | 727 | 10.7.27.0/24 |
fd95:26fb:39ab:0726::/64 | fd95:26fb:39ab:0727::/64 | |||||
LSC (LSST) x Cerro Pachón | Per Path | 100G | 728 | 10.7.28.0/24 | 729 | 10.7.29.0/24 |
fd95:26fb:39ab:0728::/64 | fd95:26fb:39ab:0729::/64 | |||||
Huechuraba x LSC (AURA) | Per Path | 100G | 722 | 10.7.22.0/24 | 723 | 10.7.23.0/24 |
fd95:26fb:39ab:0722::/64 | fd95:26fb:39ab:0723::/64 | |||||
MIA x LSC (LSST) | Per Path - BGP chosen | 100G | N/A | MIA: 198.32.252.193/31 | N/A | MIA: 198.32.252.195/31 |
LSC: 139.229.140.135/31 | LSC: 139.229.140.137/31 | |||||
MIA x NCSA | Per Path - BGP chosen | 100G | N/A | MIA: 198.32.252.193/31 | N/A | MIA: 198.32.252.195/31 |
NCSA: 141.142.129.87/31 | NCSA: 141.142.129.85/31 | |||||
LSC (LSST) x NCSA | Per Path - BGP chosen | 100G | N/A | LSC: 139.229.140.135/31 | N/A | LSC: 139.229.140.137/31 |
NCSA: 141.142.129.87/31 | NCSA: 141.142.129.85/31 |
Perfsonar Host Configuration information
To support 10Gbps speeds, on paths of 100ms, 120MB of buffer must be available. Hosts configured to use jumbo frames need more buffer space as well.
Increased interface descriptors
The e1000 chips (Intel 1GE, often integrated into motherboards) driver defaults to 256 Rx and 256 Tx descriptors. This is because early versions of the chipset only supported this. All recent versions have 4096, but the driver doesn't autodetect this. Increasing the number of descriptors can improve performance dramatically on some hosts.
Some sample Linux output from an Intel 10GE NIC that is using the default config is below:
[galiza@perfsonar ~]# ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256
On each host, the following was added to /etc/modprobe.d/ixgbe.conf (assuming ixgbe - 10G NIC card):
alias eth0 ixgbe
options ixgbe allow_unsupported_sfp=1options ixgbe RxDescriptors=4096,4096 TxDescriptors=4096,4096
You can also add this to /etc/rc.local to get the same result.
ethtool -G ethN rx 4096 tx 4096
Increased interface Maximum Transfer Unit (MTU)
Add to /etc/sysconfig/network-scripts/ifcfg-ethX
MTU=9000
Increased TX Queue Length on 10G NICs
Added to /etc/rc.local:
# increase txqueuelen for 10G NICS
/sbin/ifconfig ethX txqueuelen 10000
Linux Kernel TCP/IP Configuration
Added to /etc/sysctl.conf
# increase TCP max buffer size setable using setsockopt()
# allow testing with 256MB buffers
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
# allow auto-tuning up to 128MB buffers
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728# recommended to increase this for 10G NICS or higher
net.core.netdev_max_backlog = 250000# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1# Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels
net.ipv4.tcp_congestion_control=htcp# If you are using Jumbo Frames, also set this
net.ipv4.tcp_mtu_probing = 1
For hosts with 10G NIC optimized for network paths up to 200ms RTT, or a 40G NIC up on paths up to 50ms RTT, use these values instead of the settings above:
# increase TCP max buffer size setable using setsockopt()
net.core.rmem_max = 536870912
net.core.wmem_max = 536870912# increase Linux autotuning TCP buffer limit
net.ipv4.tcp_rmem = 4096 87380 268435456
net.ipv4.tcp_wmem = 4096 65536 268435456
Traffic shaping
Testing from a faster host to a slower host (e.g.,: A with 10Gbps NIC card, and B with a 1Gbps NIC card), the receiving host might end up dropping lots of packets. The only way to prevent that is to use the Linux 'traffic control' command, but this only works reliably up to around 5Gbps. E.G.:
Example: for a 10Gbps NIC p1p2 include to /etc/rc.local
/sbin/tc qdisc del dev p1p2 root
/sbin/tc qdisc add dev p1p2 handle 1: root htb
/sbin/tc class add dev p1p2 parent 1: classid 1:1 htb rate 980mbit
/sbin/tc filter add dev p1p2 parent 1: protocol ip prio 1 u32 match ip dst xxx.xxx.xxx.xxx/32 flowid 1:1
Where xxx.xxx.xxx.xxx match de IPv4 address attached to the device p1p2
MADDASH server
This tool aims to provide both LSST and AmLight engineers a two-dimensional monitoring data as a set of grids referred to as a dashboard. Using this dashboard through the web, LSST and AmLight engineers will have a continuous and near-real-time overview of the network health, being able to anticipate actions in case of service degradation or disruption.
https://dashboard.ampath.net/maddash-webui/index.cgi
3 Comments
Jeff Kantor
The WAN overview diagram may be correct currently in showing Los Angeles, but we recently decided that the baseline will be Boca Raton instead of LA. There is too much difficulty getting a robust, high-speed link between Panama and LA.
Jeff Kantor
Also, Tucson might not connect on that path, might be some kind of spur.
Jeff Kantor
Finally, we are going to turn in a change request on LSE-78 soon. We were waiting on the results of the Chilean Network Acquisitions Review. The materials from that review are at:
https://project.lsst.org/reviews/cnar/