Issue


After creating an LACP bond (mode 4) using two or more NICS (max 4), all performance seems to go through one interface instead of using four interfaces.

Slow network (or less than expected) performance on the network where the bond is. It is observed when 4 LACP NICS are used for the OnApp Storage Network.


Troubleshooting


Executing IPerf shows that the max speed is 1 NIC bandwidth.

 #Excuted Iperf -c 10.200.4.254 -N -P 4 -M 9230 on the client side:

[root@test ~]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37718
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 5] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37721
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37724
[ 6] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37722
[ 7] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37723
[ 6] 0.0-10.0 sec 382 MBytes 319 Mbits/sec
[ 5] 0.0-10.1 sec 368 MBytes 307 Mbits/sec
[ 4] 0.0-10.1 sec 295 MBytes 246 Mbits/sec
[ 7] 0.0-10.1 sec 137 MBytes 114 Mbits/sec
[SUM] 0.0-10.1 sec 1.16 GBytes 987 Mbits/sec
[ 8] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37730
[ 8] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37735
[ 4] 0.0-10.0 sec 1.15 GBytes 984 Mbits/sec
CODE


Cause


As per https://www.kernel.org/doc/Documentation/networking/bonding.txt,  mode 4 utilizes all the slaves in the active aggregator. Slave selection for outgoing traffic is performed according to the transmit hash policy: 

802.3ad or 4 

				IEEE 802.3ad Dynamic link aggregation. Creates 
				aggregation groups that share the same speed and 
				duplex settings. Utilizes all slaves in the active 
				aggregator according to the 802.3ad specification. 

				Slave selection for outgoing traffic is done according 
				to the transmit hash policy, which may be changed from 
				the default simple XOR policy via the xmit_hash_policy 
				option, documented below. Note that not all transmit 
				policies may be 802.3ad compliant, particularly in 
				regards to the packet mis-ordering requirements of 
				section 43.2.4 of the 802.3ad standard. Differing 
				peer implementations will have varying tolerances for 
				noncompliance. 

				Prerequisites: 

				1. Ethtool support in the base drivers for retrieving 
				the speed and duplex of each slave. 

				2. A switch that supports IEEE 802.3ad Dynamic link 
				aggregation. 

				Most switches will require some type of configuration 
				to enable 802.3ad mode.
CODE

Depending on your switch configuration, check the following:

  • Your switch config for etherchannel is correctly configured
  • Different xmit_transfer policies


Resolution


As an example, the xmit_transfer policy is changed here from layer2+3 to layer3+4:

[root@test~]# ifdown onappstorebond
[root@test ~]# echo "layer3+4" > /sys/class/net/onappstorebond/bonding/xmit_hash_policy
[root@test ~]# ifup onappstorebond
[root@test ~]# iperf -s

[root@test1 ~]# iperf -c 10.200.1.254 -P 4 -M 9000
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
------------------------------------------------------------
Client connecting to 10.200.1.254, TCP port 5001
------------------------------------------------------------
[ 4] local 10.200.4.254 port 38032 connected with 10.200.1.254 port 5001
[ 6] local 10.200.4.254 port 38034 connected with 10.200.1.254 port 5001
[ 5] local 10.200.4.254 port 38033 connected with 10.200.1.254 port 5001
[ 3] local 10.200.4.254 port 38031 connected with 10.200.1.254 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 6] 0.0-10.0 sec 1.15 GBytes 991 Mbits/sec
[ 5] 0.0-10.0 sec 1.15 GBytes 984 Mbits/sec
[ 3] 0.0-10.0 sec 1.15 GBytes 986 Mbits/sec
[SUM] 0.0-10.0 sec 4.60 GBytes 3.95 Gbits/sec
CODE

layer3+4 is not 100% 802.3ad compliant. Check if your application properly deals with the unordered packets traffic.