Issue


Freshly installed CentOS 6.XX and XEN4 compute resources hung, even when no virtual servers are running, with the following message:

kernel:BUG: soft lockup - CPU#16 stuck for 22s! [stress:6229]

Message from syslogd@HV3-cloud at Aug 30 09:56:27 ...
 kernel:BUG: soft lockup - CPU#16 stuck for 22s! [stress:6229]
CODE


Dmesg Output

Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81140a53>] exit_mmap+0xe3/0x160
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8104fde4>] mmput+0x64/0x140
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056d25>] exit_mm+0x105/0x130
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056fcd>] do_exit+0x16d/0x450
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113df2c>] ? handle_pte_fault+0x1ec/0x210
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81057305>] do_group_exit+0x55/0xd0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81067294>] get_signal_to_deliver+0x224/0x4d0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8101489b>] do_signal+0x5b/0x140
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8126f17d>] ? rb_insert_color+0x9d/0x160
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81083863>] ? finish_task_switch+0x53/0xe0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81576fe7>] ? __schedule+0x3f7/0x710
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff810149e5>] do_notify_resume+0x65/0x80
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8157862c>] retint_signal+0x48/0x8c
 Aug 30 09:59:00 HV3-cloud kernel: Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f    05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
 Aug 30 09:59:00 HV3-cloud kernel: Call Trace:
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81009e2d>] ? xen_force_evtchn_callback+0xd/0x10
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8100a632>] check_events+0x12/0x20
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8100a61f>] ? xen_restore_fl_direct_reloc+0x4/0x4
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8111dc06>] ? free_hot_cold_page+0x126/0x1b0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81005660>] ? xen_get_user_pgd+0x40/0x80
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8111dfe4>] free_hot_cold_page_list+0x54/0xa0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81121b18>] release_pages+0x1b8/0x220
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8114da64>] free_pages_and_swap_cache+0xb4/0xe0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81268da1>] ? cpumask_any_but+0x31/0x50
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81139bbc>] tlb_flush_mmu+0x6c/0x90
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113a0a4>] tlb_finish_mmu+0x14/0x40
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81140a53>] exit_mmap+0xe3/0x160
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8104fde4>] mmput+0x64/0x140
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056d25>] exit_mm+0x105/0x130
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056fcd>] do_exit+0x16d/0x450
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113df2c>] ? handle_pte_fault+0x1ec/0x210
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81057305>] do_group_exit+0x55/0xd0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81067294>] get_signal_to_deliver+0x224/0x4d0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8101489b>] do_signal+0x5b/0x140
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8126f17d>] ? rb_insert_color+0x9d/0x160
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81083863>] ? finish_task_switch+0x53/0xe0
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81576fe7>] ? __schedule+0x3f7/0x710
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff810149e5>] do_notify_resume+0x65/0x80
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8157862c>] retint_signal+0x48/0x8c
 Aug 30 09:59:02 HV3-cloud kernel: BUG: soft lockup - CPU#5 stuck for 22s! [stress:6233]
 Aug 30 09:59:02 HV3-cloud kernel: Modules linked in: arptable_filter arp_tables ip6t_REJECT ip6table_mangle ipt_REJECT iptable_filter ip_tables bridge stp llc  xen_pciback xen_gntalloc bonding nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd ufs(O) coretemp hwmon crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 aes_generic microcode pcspkr sb_edac edac_core joydev i2c_i801 sg iTCO_wdt iTCO_vendor_support igb evdev ixgbe mdio ioatdma myri10ge dca  ext4 mbcache jbd2 raid1 sd_mod crc_t10dif ahci libahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
 Aug 30 09:59:02 HV3-cloud kernel: CPU 5
CODE


Resolution


Use the following instructions to handle RHEL/CentOS 6.x with XEN 4.2.x compute resource(s) crashes:

  1. Check the ratelimit and tslice values for Cpu-Pool (for Centos6/XEN4 only):

    root@xen4hv1 ~# xl -f sched-credit
    CODE
  2. Set the values as provided below:

    root@xen4hv1 ~# xl -f sched-credit -s -t 5ms -r 100us
    CODE

    OR

    root@xen4hv1 ~# service xend stop
    
    root@xen4hv1 ~# xl -f sched-credit -s -t 5ms -r 100us
    
    root@xen4hv1 ~# service xend start
    CODE
  3. Set the default credit scheduler CAP and weight values for Domain-0:

    # xm sched-credit -d Domain-0 -w <WEIGHT> -c <CAP>
    CODE

    Where

    WEIGHT = 600 for small compute resources or cpu_cores/2*100 for large compute resources;

    CAP = 0 for small compute resources with a few virtual servers and low CPU overselling or cpu_cores/2*100 for large compute resources with huge CPU overselling;

    For example, for a compute resource with eight cores it can be set as provided below:

    # xm sched-credit -d Domain-0 -w 600 -c 0
    CODE

    or it can be set by default to 6000 as in the following example (especially, if the cpu-overselling is disabled in the CP UI):

    # xm sched-credit -d Domain-0 -w 6000 -c 0
    CODE

    Change the <CAP> and <WEIGHT> values in the /onapp/onapp-hv.conf file:

    # vi /onapp/onapp-hv.conf
    XEN_DOM0_SCHEDULER_WEIGHT=<WEIGHT>
    XEN_DOM0_SCHEDULER_CAP=<CAP>
    CODE
  4. Assign a certain number of vCPUs for Domain-0 in /etc/grub.conf:

    # cat /boot/grub/grub.conf | grep dom0
    kernel /xen.gz dom0_mem=409600 dom0_max_vcpus=2
    CODE
  5. Change the maximum number of vCPUs value in the /onapp/onapp-hv.conf file:

    # vi /onapp/onapp-hv.conf
    
    XEN_DOM0_MAX_VCPUS=2
    CODE

    Please note that a system reboot is required to apply these changes.

Cause


During the compute resource installation, we set some parameters on Dom0 to share CPU fairly between the virtual servers. It has been working well on the previous XEN/CentOS versions, but seems to starve the Dom0 on the current XEN 4 / CentOS releases with the computers that scale down the CPU power.

 # This is the parameter set by default:
xm sched-credit -d 0 -c 200
-d domain -c cap, **the maximum amount of CPU a domain will be able to consume, even if the host system has idle CPU cycles. The cap is expressed in percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc.

[root@ ~]# xm sched-credit Name ID Weight Cap Domain-0 0 65535 200
CODE

This seems to starve the Dom0 CPU on the servers that scale down the CPU power.
From http://wiki.xen.org/wiki/Credit_Scheduler

The Cap optionally fixes the maximum amount of CPU a domain will be able to consume, even if the host system has idle CPU cycles. The cap is expressed in percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc. The default, 0, means there is no upper cap. 

Many systems have features that scale down the computing power of a CPU that is not 100% utilized. This can be in the operating system, but can also be below the operating system, in the BIOS. If you set a cap with individual cores running at less than 100%, this may have an impact on the performance of your workload over and above the impact of the cap. For example, if your processor runs at 2 GHz, and you cap a VS at 50%, the power management system may also reduce the clock speed to 1 GHz; the effect will be that your VS gets 25% of the available power (50% of 1 GHz) rather than 50% (50% of 2 GHz). If you are not getting the performance you expect, look at the performance and cpufreq options in your operating system and your BIOS.