perfSONAR Hosts description


Each host has three interfaces:

  • One 10G interface for BWCTL
  • One 10G interface for latency and loss (OWAMP)
  • One 10G or 1G interface for management

Login information

CityNodeMgmt IPThroughputLoss
Miami (AmLight)ps-mia.ampath.net190.103.184.93 (p1p2.4003)p1p2em3
Huechuraba (AmLight)ps-scl.sdn.amlight.net190.103.184.113 (em1.4002)p2p2p2p1
São Paulo (AmLight)ps-spo.sdn.amlight.net200.136.41.86 (enp7s0f1)enp11s0f0enp11s0f1
Panamá City (AmLight)ps-pty.sdn.amlight.net200.0.207.47 (em1.7)p1p2em1
La Serena (AURA)ps-aura.sdn.amlight.net139.229.127.104 (em1)p1p2p2p2
La Serena (LSST)perfsonar1.ls.lsst.org139.229.135.17 (em1)p2p1p2p2
Champaign (NCSA)ps-ncsa141.142.129.31 (em1)em2p2p1
Cerro Pachón (LSST)TBDTBDTBDTBD
Santiago (REUNA)TBDTBDTBDTBD

High-level Perfsonar Deployment Diagram

Configured Tests


Test

Type

Path BW
Throughput

Loss
VLANIPVLANIP
MIA x PTYPer Path100G70610.7.6.0/2470710.7.7.0/24
fd95:26fb:39ab:0706::/64fd95:26fb:39ab:0707::/64
MIA x HuechurabaPer Link10G70010.7.0.0/2470110.7.1.0/24
fd95:26fb:39ab:0700::/64fd95:26fb:39ab:0701::/64
MIA x HuechurabaPer Path100G (Atlantic)71810.7.18.0/2471910.7.19.0/24
fd95:26fb:39ab:0718::/64fd95:26fb:39ab:0719::/64
MIA x HuechurabaPer Path100G (Pacific)71610.7.16.0/2471710.7.17.0/24
fd95:26fb:39ab:0716::/64fd95:26fb:39ab:0717::/64
MIA x SPOPer Path100G (Atlantic)70810.7.8.0/2470910.7.9.0/24
fd95:26fb:39ab:0708::/64fd95:26fb:39ab:0709::/64
MIA x SPOPer Path100G (Monet)72410.7.24.0/2472510.7.25.0/24
fd95:26fb:39ab:0724::/64fd95:26fb:39ab:0725::/64
PTY x HuechurabaPer Link100G70210.7.2.0/2470310.7.3.0/24
fd95:26fb:39ab:0702::/64fd95:26fb:39ab:0703::/64
PTY x HuechurabaPer Link100G70210.7.2.0/2470310.7.3.0/24
fd95:26fb:39ab:0702::/64fd95:26fb:39ab:0703::/64
Huechuraba x SPOPer Link100G70410.7.4.0/2470510.7.5.0/24
fd95:26fb:39ab:0704::/64fd95:26fb:39ab:0705::/64
Huechuraba x LSC (LSST)Per Path100G72610.7.26.0/2472710.7.27.0/24
fd95:26fb:39ab:0726::/64fd95:26fb:39ab:0727::/64
LSC (LSST) x Cerro PachónPer Path100G72810.7.28.0/2472910.7.29.0/24
fd95:26fb:39ab:0728::/64fd95:26fb:39ab:0729::/64
Huechuraba x LSC (AURA)Per Path100G72210.7.22.0/2472310.7.23.0/24
fd95:26fb:39ab:0722::/64fd95:26fb:39ab:0723::/64
MIA x LSC (LSST)Per Path - BGP chosen100GN/AMIA: 198.32.252.193/31N/AMIA: 198.32.252.195/31
LSC: 139.229.140.135/31LSC: 139.229.140.137/31
MIA x NCSAPer Path - BGP chosen100GN/AMIA: 198.32.252.193/31N/AMIA: 198.32.252.195/31
NCSA: 141.142.129.87/31NCSA: 141.142.129.85/31
LSC (LSST) x NCSAPer Path - BGP chosen100GN/ALSC: 139.229.140.135/31N/ALSC: 139.229.140.137/31
NCSA: 141.142.129.87/31NCSA: 141.142.129.85/31

Perfsonar Host Configuration information

To support 10Gbps speeds, on paths of 100ms, 120MB of buffer must be available. Hosts configured to use jumbo frames need more buffer space as well.

Increased interface descriptors

The e1000 chips (Intel 1GE, often integrated into motherboards) driver defaults to 256 Rx and 256 Tx descriptors. This is because early versions of the chipset only supported this. All recent versions have 4096, but the driver doesn't autodetect this. Increasing the number of descriptors can improve performance dramatically on some hosts.

Some sample Linux output from an Intel 10GE NIC that is using the default config is below:

[galiza@perfsonar ~]# ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX:         4096
RX Mini:    0
RX Jumbo:    0
TX:         4096
Current hardware settings:
RX:        256
RX Mini:     0
RX Jumbo:    0
TX:        256

On each host, the following was added to /etc/modprobe.d/ixgbe.conf (assuming ixgbe - 10G NIC card):

  alias eth0 ixgbe
options ixgbe allow_unsupported_sfp=1
  options ixgbe RxDescriptors=4096,4096 TxDescriptors=4096,4096

You can also add this to /etc/rc.local to get the same result.

  ethtool -G ethN rx 4096 tx 4096

Increased interface Maximum Transfer Unit (MTU)

Add to /etc/sysconfig/network-scripts/ifcfg-ethX

MTU=9000

Increased TX Queue Length on 10G NICs

Added to /etc/rc.local:
# increase txqueuelen for 10G NICS
/sbin/ifconfig ethX txqueuelen 10000

Linux Kernel TCP/IP Configuration

Added to /etc/sysctl.conf
# increase TCP max buffer size setable using setsockopt()
# allow testing with 256MB buffers
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
 
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
# allow auto-tuning up to 128MB buffers
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# recommended to increase this for 10G NICS or higher
net.core.netdev_max_backlog = 250000
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1

# Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels
net.ipv4.tcp_congestion_control=htcp

# If you are using Jumbo Frames, also set this
net.ipv4.tcp_mtu_probing = 1

For hosts with 10G NIC optimized for network paths up to 200ms RTT, or a 40G NIC up on paths up to 50ms RTT, use these values instead of the settings above:

# increase TCP max buffer size setable using setsockopt()
net.core.rmem_max = 536870912
net.core.wmem_max = 536870912

# increase Linux autotuning TCP buffer limit
net.ipv4.tcp_rmem = 4096 87380 268435456
net.ipv4.tcp_wmem = 4096 65536 268435456

Traffic shaping

Testing from a faster host to a slower host (e.g.,: A with 10Gbps NIC card, and B with a 1Gbps NIC card), the receiving host might end up dropping lots of packets. The only way to prevent that is to use the Linux 'traffic control' command, but this only works reliably up to around 5Gbps. E.G.:

Example: for a 10Gbps NIC p1p2 include to /etc/rc.local

/sbin/tc qdisc del dev p1p2 root
/sbin/tc qdisc add dev p1p2 handle 1: root htb
/sbin/tc class add dev p1p2 parent 1: classid 1:1 htb rate 980mbit
/sbin/tc filter add dev p1p2 parent 1: protocol ip prio 1 u32 match ip dst xxx.xxx.xxx.xxx/32 flowid 1:1

Where xxx.xxx.xxx.xxx match de IPv4 address attached to the device p1p2



MADDASH server

This tool aims to provide both LSST and AmLight engineers a two-dimensional monitoring data as a set of grids referred to as a dashboard. Using this dashboard through the web, LSST and AmLight engineers will have a continuous and near-real-time overview of the network health, being able to anticipate actions in case of service degradation or disruption. 

https://dashboard.ampath.net/maddash-webui/index.cgi


  • No labels

3 Comments

  1. The WAN overview diagram may be correct currently in showing Los Angeles, but we recently decided that the baseline will be Boca Raton instead of LA.  There is too much difficulty getting a robust, high-speed link between Panama and LA.

  2. Also, Tucson might not connect on that path, might be some kind of spur.

  3. Finally, we are going to turn in a change request on LSE-78 soon.  We were waiting on the results of the Chilean Network Acquisitions Review.  The materials from that review are at:

    https://project.lsst.org/reviews/cnar/