CloudBoot Troubleshooting Checklist
CloudBoot and Integrated Storage Checklist
Package Install Checks
Are the integrated storage, CloudBoot images, and Control Panel server RPMs installed on the Control Panel server?
rpm -qa | egrep "onapp-store-install|onapp-cp-[4,5,6]|onapp-ramdisk"CODE
- Are their versions up-to-date? Refer to the Updates and Packages Versions page.
Pre-Checks of CloudBoot Environment
- Does the Control Panel server have an additional NIC configured for compute resource management?
- What is the IP address of this compute resource management NIC?
- Is CloudBoot enabled in the Control Panel > Admin > Settings > Configuration wizard?
- Do the Static Config Target and CP Server Cloudboot Target fields have the IP address as determined in Step 2?
- Have the CloudBoot IP addresses + netmask been entered in the Compute Resource Settings window? (Control Panel > Admin > Settings > Compute Resources + CloudBoot IPs menu)
- Does the address of Step 2 fall within the network IP address range?
Pre-Check of dhcp, TFTP and NFS Server Settings
- Verify the contents of
/home/onapp/dhcpd.conf. Does it contain the correct network settings as identified above? (later versions may point to
/onapp/configuration/dhcp/dhcpd.confinstead, but should include the statement in /etc/dhcp/dhcpd.conf).
- Check if the
tftpbootservice is enabled and running
(grep disable /etc/xinetd.d/tftpshould be
ifsexports are setup correctly:
PROMPT> cat /etc/exports /onapp/templates <MGT SUBNET>(ro,no_root_squash) /tftpboot/export <MGT SUBNET>(ro,no_root_squash) /tftpboot/images/centos5/diskless/snapshot <MGT SUBNET>(rw,sync,no_root_squash)CODE
<MGT SUBNET>should match the compute resource management subnet setup in the first instance.
- Check if the
nfsservice is running correctly (
Pre-Check of PXE Boot Templates
Do the default template files exist?
PROMPT> ls -1 /tftpboot/pxelinux.cfg/template-* /tftpboot/pxelinux.cfg/template-default /tftpboot/pxelinux.cfg/template-kvm /tftpboot/pxelinux.cfg/template-xen /tftpboot/pxelinux.cfg/template-xen-without-onappstoreCODE
Do the servers boot off a NIC other than
eth0? If so, you should edit the default template and add the following:
ETHERNET=<ETH DEV> parameter: PROMPT> cat /tftpboot/pxelinux.cfg/default default centos7-ramdisk-default label centos7-ramdisk-default kernel images/centos7/ramdisk-default/vmlinuz append initrd=images/centos7/ramdisk-default/initrd.img NFSNODEID=default NFSROOT=192.168.1.1:/tftpboot/export/centos7/default CFGROOT=192.168.1.1:/tftpboot/images/centos5/diskless/snapshot ADDTOBRIDGE=mgt pcie_aspm=off selinux=0 cgroup_disable=memory net.ifnames=0 biosdevname=0 ipappend 2CODE
We recommend that you make the same change to the
/tftpboot/pxelinux.cfg/template-default file to make the change persistent across the CP UI.
Boot Time Visual Check of a Server from the Server Console
- Do you see the server attempting to PXE boot and acquire a DHCP address before looking for an internal storage drive?
- Is it trying to dhcp off the correct ethernet device that is attached to the CP server management subnet?
- Does it successfully acquire an IP address that matches the one you entered in the UI?
grep DHCP /var/log/messagesand verify whether there is a recent entry in there for the new compute resource, its MAC address and the assigned IP address.
- Does the MAC address appear in the
/var/lib/dhcpd/dhcp.leasesfile on the CP server?
Enabling Additional Debug for the PXE Boot Process
/etc/xinetd.d/tftpfile and adjust
server_argsto match the following line:
server_args = -v -v -s /tftpbootCODE
- Restart the
You should now see additional logging submitted to the
/var/log/messages log in the event of server bootup.
All logs from the compute resource are available on the CP in
Post-Bootup Analysis Once Server Has Successfully Booted to a Prompt
- What is the output from ifconfig at the terminal?
- Does the MAC address show up in the drop-down menu in the Add new cloudboot compute resource wizard?
Can you log in to the server over SSH from the CP server? Using IP address of NIC in output from 1 above, issue:
ssh root@<IP ADDRESS>CODE
after accepting the host key, it should log in without requiring a password.
- If you can select the MAC address and go through to the next stage of the wizard, try booting the node as both XEN and KVM in different tests, check that both nodes come up cleanly with the correct IP address assigned. Check also that after boot up they show as active in the CP server UI.
Verifying Integrated Storage Configuration and Status
- Verify that all HVs in the same zone have all NICs assigned to the SAN attached to the same logical subnet.
- Ensure that all compute resources in the same zone are of the same type (XEN or KVM).
- Use a different channel for the storage SAN between different zones to ensure connections are logically separate.
- When adding new compute resources, remember to select the Format disks option in the UI to initialize all new drives.
- Check the Diagnostics page in the Integrated Storage section in UI whether it has no errors.
- Log in to one of the compute resources and use the storage CLI to list the storage nodes that are visible ('onappstore nodes').
- Make sure all the nodes are accessible over IP.
Storage Nodes Not Visible in the UI or from the CLI
1. Start by disabling all hardware passthrough for the compute resources and see if the drives appear. If they do, then the issue probably lies in the hardware configuration.
- Try each of the hardware enablement options: memory alignment,
no_pci_disableand check if that makes a difference for detecting drives.
- Contact the OnApp Support team supplying the detailed hardware description including motherboard, storage controllers, drives attached.
2. When booted as KVM, log in to the compute resource and verify what drives are visible locally (
ls -lh /dev/sd; ls -lh /dev/cciss/).
3. When booted as KVM, log in to the compute resource and verify whether the IO controller VM is running (
virsh dominfo STORAGENODE).
Storage Nodes Only Partially Visible
- Verify all network connections and make sure they are connected to the correct logical subnets.
- Verify any bonded connections and make sure that all members of the bond are connected to the same logical subnet.
- Make sure all compute resources are utilizing the same storage channel.
- Check if the same MTU set for all compute resources and jumbo frames are enabled on switches environment.
- Check if multicast is allowed on your network equipment.
- Another possible root of this case might be not enough free space inside the storage controllers, especially if the Overcommit feature is enabled.