Issue


After installing a new compute resource, very poor TCP performance was experienced on the virtual servers residing on this compute resource.


Environment


Any OnApp version


Troubleshooting


  1. Download speed tests showed 4 Kb/s. A 500-MB file estimated a 62-hour download time:

    [root@tes-ppa ~]# curl -o /dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip      
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                  
                                     Dload  Upload   Total   Spent    Left  Speed   
      0  500M    0 35944    0     0   2324      0 62:39:57  0:00:15 62:39:42  2350^C
    CODE
  2. The same test from the compute resource showed much faster download speed - 16 MB/s.
  3. While checking the logs, the following was found in /var/log/dmesg and dmesg:

    bond1.175: received packets cannot be forwarded while LRO is enabled 
    bond1.175: received packets cannot be forwarded while LRO is enabled 
    bond1.175: received packets cannot be forwarded while LRO is enabled 
    bond1.175: received packets cannot be forwarded while LRO is enabled 
    bond1.175: received packets cannot be forwarded while LRO is enabled 
    bond0.200: received packets cannot be forwarded while LRO is enabled 
    bond0.200: received packets cannot be forwarded while LRO is enabled 
    bond0.200: received packets cannot be forwarded while LRO is enabled 
    bond0.200: received packets cannot be forwarded while LRO is enabled 
    bond0.200: received packets cannot be forwarded while LRO is enabled
    CODE

Resolution


  1. Disable LRO on each NIC:

    ethtool -K eth2 lro off
    CODE

    After disabling LRO, the TCP performance should be fine on the virtual servers:

    [root@tes-ppa ~]# curl -o /dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip 
     % Total % Received % Xferd Average Speed Time Time Time Current 
     Dload Upload Total Spent Left Speed 
    100 500M 100 500M 0 0 13.2M 0 0:00:37 0:00:37 --:--:-- 17.2M
    CODE
  2. To have the config persistent across reboots, use ETHTOOL_OPTS (initscripts-9.03.27-1 or later) in the rc.local of the compute resource. #Disabling LRO after poor TCP performance identified:

    ETHTOOL_OPTS="-K eth0 lro off" 
    ETHTOOL_OPTS="-K eth1 lro off" 
    ETHTOOL_OPTS="-K eth2 lro off" 
    ETHTOOL_OPTS="-K eth3 lro off"
    CODE
  3. If the version of initscripts is earlier than initscripts-9.03.27-1, in rc.local:

    /sbin/ethtool -K eth0 lro off 
    /sbin/ethtool -K eth1 lro off 
    /sbin/ethtool -K eth2 lro off 
    /sbin/ethtool -K eth3 lro off 
    ..... etc
    CODE