Question


How do I configure a highly available iSCSI + multipath data store using Nexenta SAN in OnApp?


Assumptions


The client uses CentOS 6 or CentOS 5 (your compute resources). All your compute resources and your backup server are able to access all the targets on the SAN (the network is configured). Compute resources and SAN have two dedicated NICs that are in the same subnet, e.g., 10.99.140.41, 10.99.139.41.


Objectives


This article is based on the NexentaStor High-Availability iSCSI with Linux Clients page, as it has been adapted to our environment and it also explains how to configure the resulting path and create the data stores within OnApp. It is provided as it is, and it's not supported by our support staff. If you have any doubts about HA, etc., ask your hardware vendor to review the config.


Settings in Use


ISCSI target names:

iqn.2010-08.org.example:onappiscsi
iqn.1986-03.com.nexenta:ecd714e43149
CODE

SCSI (NEXENTASTOR) target portal group name(s) and associated IP(s) TGP / ip:port:

tpg_lab9_e1000g1 / 10.99.140.41:3260

   tpg_lab9_e1000g2 / 10.99.139.41:3260
CODE

SCSI (NEXENTASTOR) target group name(s):

tg_onapp
CODE


SCSI (NEXENTASTOR) initiator group(s):

hg_onapp

Initiator(s) from the CentOS client

iqn.2011-10.test:1:server-testonapp-hv1
CODE


Answer


Steps on Nexenta SAN

Perform the following steps via Management GUI (NMV):

  1. Create a target portal groups the iSCSI targets will be bound to.
    Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Target Portal Groups. Click Create under Manage iSCSI TPGs.
  2. Enter the name (in our case, tpg_lab9_e1000g1) in the Name field and the IP Address:Port (in our case, 10.99.140.4:3260) in the Addresses field, and then click Create. Use a descriptive name for target portal groups, including the name of the network interface to which the group is bound.
  3. Repeat the above step for the second target portal group. Each group is bound to a single IP address. If more than two interfaces are used, repeat the actions for each interface/IP.
    Add iSCSI targets, target groups, and initiator groups.
    Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Targets. Click Create under Manage iSCSI Targets.
    Enter the name (in our case, iqn.2011-10.test:1:server-testonapp-hv1) in the Name field. You can leave this field empty to autogenerate a name.
    You may omit aliases, as they are optional, but you can set up an alias for each target.
  4. Under iSCSI Target Portal Groups select one of the target portal groups created earlier. For example, :01: and :02: in the name of the targets to designate the first and second targets. The first target is related to tpg_lab9_e1000g1 and the second target to tpg_lab9_e1000g2. Click Create.
  5. Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups. Click Create under Manage Target Groups.
  6. Enter the name (in this case, tg_onapp) in the Name field and select all targets (in this case, select two targets you created earlier), and then click Create.
    Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups.
  7. Enter the name (in this case, hg_onapp) in the Name field, and if initiator names are already known, you can enter them manually in the Additional Initiators field. Otherwise, you can update this group later after logging in from your client(s). Then click Create.
  8. Create mappings. You should have ZVOLs created prior to setting up mappings. Refer to the User Guide for further details on the creation of ZVOLs and mappings. For the purposes of this document, we are using four ZVOLs, with LUN ID's 10 through 13 and hg_onapp as a host group and tg_onapp as a target group.

Steps for Client

  1. Quick validation of the iscsid service is necessary to make sure that it is set up correctly. The chkconfig --list iscsid command should return the state of the iscsid service. We expect to have it enabled in runlevels 3, 4, and 5. If not enabled, run chkconfig iscsid, which will assume the defaults and enable service in runlevels 3, 4, and 5:

    Please note that different distributions of Linux may have different names for services and tools/methods to enable/disable automatic startup of services.

    onapp-test# chkconfig --list iscsid
    iscsid          0:off   1:off   2:off   3:on    4:on    5:on    6:off
    CODE

        

  2. Check if the multipathd service works correctly. The mpathconf utility returns information about the state of the multipath configuration:

    onapp-test# mpathconf
    multipath is enabled
    find_multipaths is disabled
    user_friendly_names is enabled
    dm_multipath module is loaded
    multipathd is chkconfiged on
    CODE


  3. Configure the iSCSI initiator settings. Depending on your client, the configuration files may/may not be in the same location(s). Modify /etc/iscsi/initiatorname.iscsi with a custom initiator name. By default, the file will already have an InitiatorName entry. Replace it with a custom entry, but this is strictly optional, and should be part of your naming convention decisions:

    iqn.2011-10.test:1:server-testonapp-hv1
    CODE

       

  4. Create a virtual iSCSI interface for each physical network interface and bind the physical interfaces to the virtual iSCSI interfaces. The final result includes two new iSCSI interfaces: sw-iscsi-0 and sw-iscsi-1 bound to the physical network interfaces eth4 and eth5. Avoid naming logical interfaces with the same names as the physical NICs.
    Create the logical interfaces for which a corresponding configuration file named sw-iscsi-0 and sw-iscsi-0 will be generated under /var/lib/iscsi/ifaces:

    onapp-test# iscsiadm --mode iface --op=new --interface sw-iscsi-0
    onapp-test# iscsiadm --mode iface --op=new --interface sw-iscsi-1
    CODE


  5. Following the two iscsiadm commands, modify the two newly created config files with additional parameters. Here is an example of one of the configuration files modified with the details about the physical interface. Note that we are defining parameters here specific to each interface and your configuration will certainly vary. 
    To quickly collect information about each interface, use the ip command: ip addr show <interface-name>|egrep 'inet|link’
    This configuration explicitly binds our virtual interfaces to physical interfaces. Each interface is on its own network (in our case, 10.99.140 and 10.99.139).
    We can always validate our configuration with this: for i in 0 1; do iscsiadm -m iface -I sw-iscsi-$i; done, replacing 0 and 1 with whatever number of the iSCSI interface(s) we used.

    onapp-test# cat /var/lib/iscsi/ifaces/sw-iscsi-0
    # BEGIN RECORD 6.2.0-873.2.el6
    iface.iscsi_ifacename = eth4
    iface.hwaddress = 6C:AE:8B:61:54:BC
    iface.transport_name = tcp
    iface.vlan_id = 0
    iface.vlan_priority = 0
    iface.iface_num = 0
    iface.mtu = 0
    iface.port = 0
    # END RECORD
    
    onapp-test# cat /var/lib/iscsi/ifaces/sw-iscsi-1
    # BEGIN RECORD 6.2.0-873.2.el6
    iface.iscsi_ifacename = eth5
    iface.hwaddress = 6C:AE:8B:61:54:BD
    iface.transport_name = tcp
    iface.vlan_id = 0
    iface.vlan_priority = 0
    iface.iface_num = 0
    iface.mtu = 0
    iface.port = 0
    # END RECORD
    CODE

           

  6. Depending on your client, the TCP/IP kernel parameter rp_filter may need to be tuned to allow to correct multipathing with iSCSI. For the purposes of this guide, set 0. For each physical interface, add an entry to /etc/sysctl.conf. In our example, we are modifying this tunable for eth1 and eth2. If the kernel tuning is being applied, either set the parameters via the sysctl command or reboot system prior to the next steps.

    onapp-test# grep eth[0-9].rp_filter /etc/sysctl.conf
        net.ipv4.conf.eth1.rp_filter = 0
        net.ipv4.conf.eth2.rp_filter = 0
    CODE


  7. Discover the targets via the portals that we exposed previously. We can do a discovery against one of the both portals, and the result should be identical. We discover targets for each configured logical iSCSI interface. If you choose to have more than two logical interfaces, and therefore more than two paths to the SAN, perform the following step for each logical interface:

    onapp-test# iscsiadm -m discovery -t sendtargets --portal=10.99.140.41 -I sw-iscsi-0 --discover
    onapp-test# iscsiadm -m discovery -t sendtargets --portal=10.99.139.41 -I sw-iscsi-1 --discover
    CODE

           

  8. Validate nodes created as a result of the discovery. We expect to see two nodes for each portal on the SAN:

    onapp-test# iscsiadm -m node
    10.99.140.41:3260,2 iqn.2010-08.org.example:onappiscsi
    10.99.140.41:3260,2 iqn.1986-03.com.nexenta:ecd714e43149
    10.99.139.41:3260,2 iqn.2010-08.org.example:onappiscsi
    10.99.139.41:3260,2 iqn.1986-03.com.nexenta:ecd714e43149
    CODE


  9. At this point, we have each logical iSCSI interface configured to log in to all known portals on the SAN and into all known targets. We can allow this configuration to remain or we can choose to isolate each logical interface to a single portal on the SAN.

     iscsiad -m node -l 
    CODE
  10. Each node will have a directory under /var/lib/iscsi/nodes with the name identical to the target name on the SAN and a subdirectory for each portal. Here, we can see that the leaf objects of this tree structure are files named identical to our logical iSCSI interfaces. In fact, these are config files generated upon successful target discovery for each interface.

    onapp-test# ls -l /var/lib/iscsi/nodes/iqn.2010-08.org.example:onappiscsi/10.99.139.120\,3260\,3/
    total 8
    -rw------- 1 root root 1878 Jun 28  2013 sw-iscsi-0
    -rw------- 1 root root 1878 Jun 28  2013 sw-iscsi-1
    
    onapp-test# ls -l /var/lib/iscsi/nodes/iqn.1986-03.com.nexenta:ecd714e43149/10.99.140.220\,3260\,3/
    total 8
    -rw------- 1 root root 1862 May 21 08:13 sw-iscsi-0
    -rw------- 1 root root 1862 May 21 08:47 sw-iscsi-1
    CODE

    Deletion of one of the interface configuration files under each node will restrict the interface from logging in to the target. At any time, you can choose which interfaces will login in to which portals. For the purposes of this document, we are going to assume that it is acceptable for each logical iSCSI interface to log in to both portals.

  11. Because we have one additional layer between OS and iSCSI (DM-MP layer), we need to tune iSCSI parameters adjusting the time it takes to hand off failed commands to the DM-MP layer. Typically, it takes 120 seconds for the iSCSI service to give up on command when there are issues with completing the command. We want this period to be a lot shorter and allow DM-MP to try and switch to another path instead of potentially trying down the same path for two minutes. We need to modify /etc/iscsi/iscsid.conf and comment out the default entry with the value 120 (seconds), and add a new entry with the value 10.

    onapp-test# grep node.session.timeo.replacement_timeout /etc/iscsi/iscsid.conf
    #node.session.timeo.replacement_timeout = 120
    node.session.timeo.replacement_timeout = 10
    CODE
  12. There is a number of parameters that we have to configure for the device mapper to correctly multipath with these LUNs. First, configure the /etc/multipath.conf configuration file with the basic settings necessary to properly manage multipath behavior and path failure. This is not a be-all-and-end-all configuration, rather a very good starting point for most NexentaStor deployments. Be aware that this configuration of multipath may fail in systems with an older version of multipath. If you are running Debian and older RedHat-based distributions, parameters that should be adjusted are checker_timer, getuid_callout, path_selector. Be certain to review your distribution's multipath documentation or a commented sample multipath.conf file:

    defaults {
                checker_timer               120
                getuid_callout              "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
                no_path_retry               12
                path_checker                directio
                path_grouping_policy        group_by_serial
                prio                        const
                polling_interval            10
                queue_without_daemon        yes
                rr_min_io                   1000
                rr_weight                   uniform
                selector                    "round-robin 0"
                udev_dir                    /dev
                user_friendly_names         yes
        }
        blacklist {
                devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
                devnode "^hd[a-z]"
                devnode "^sda"
                devnode "^sda[0-9]"
                device {
                        vendor DELL
                        product "PERC|Universal|Virtual"
                }
        }
        devices {
              device {
                      ## This section Applicable to ALL NEXENTA/COMSTAR provisioned LUNs
                      ## And will set sane defaults, necessary to multipath correctly
                      ## Specific parameters and deviations from the defaults should be
                      ## configured in the multipaths section on a per LUN basis
                      ##
                      vendor                NEXENTA
                      product               NEXENTASTOR
                      path_checker          directio
                      path_grouping_policy  group_by_node_name
                      failback              immediate
                      getuid_callout        "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
                      rr_min_io             2000
                      ##
                      ##
                      ## Adjust `node.session.timeo.replacement_timeout` in /etc/iscsi/iscsid.conf
                      ## in order to rapidly fail commands down to the multipath layer
                      ## and allow DM-MP to manage path selection after failure
                      ## set node.session.timeo.replacement_timeout = 10
                      ##
                      ## The features parameter works with replacement_timeout adjustment
                      features              "1 queue_if_no_path"
                }
        }
    CODE

        

  13. Access each LUN via multiple paths. Any parameters set in the device and defaults sections can be skipped provided that we accept the global settings in the device and defaults sections. Note that we explicitly define WWID for each LUN. We also supply an alias which makes it easier to specifically identify LUNs; this is strictly optional. Parameters vendor and product will always be NEXENTA and COMSTAR respectively, unless explicitly changed on the SAN, which is out of scope of this configuration.

     multipaths {
                    ## Define specifics about each LUN in this section, including any
                    ## parameters that will be different from defaults and device
                    ##
               multipath {
                      alias                  mpathc
                      wwid                  3600144f05415410000005077d06b0002
                      vendor                NEXENTA
                      product               NEXENTASTOR
                      path_selector         "service-time 0"
                      failback              immediate
               }
               multipath {
                      alias                mpathb
                      wwid                3600144f05415410000005077d05b0001
                      vendor                NEXENTA
                      product               NEXENTASTOR
                      path_selector         "service-time 0"
                      failback              immediate
                }
    
        }
    CODE


       

  14. Save the file as /etc/multipath.conf. Flush and reload device-mapper maps. At this point, we are assuming that the multipathd service is running on the system. Running multipath -v2 gives enough details to make sure that maps are being created correctly:

    onapp-test# multipath -F
    onapp-test multipath -v2 
    CODE


    A typical multipath configuration for any single LUN is similar to the following:

    [onpp-test~]# multipath -ll
      mapthc (3600144f05415410000005077d06b0002) dm-4 NEXENTA,NEXENTASTOR
        size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
        `-+- policy='round-robin 0' prio=1 status=active
          |- 26:0:0:2 sdg 8:96 active ready running
          `- 25:0:0:2 sdd 8:48 active ready running
    CODE


  15. The resulting multipath devices can be used to create a data store:

    mpathc (3600144f05415410000005077d06b0002) dm-4 NEXENTA,NEXENTASTOR
    size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
    `-+- policy='round-robin 0' prio=1 status=active
      |- 26:0:0:2 sdg 8:96 active ready running
      `- 25:0:0:2 sdd 8:48 active ready running
    
    mpathb (3600144f05415410000005077d05b0001) dm-3 NEXENTA,NEXENTASTOR
    size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
    `-+- policy='round-robin 0' prio=1 status=active
      |- 26:0:0:1 sdf 8:80 active ready running
      `- 25:0:0:1 sdb 8:16 active ready running 
    CODE


Steps in OnApp (Configure the data store within OnApp and the compute resource)


Please note that you are required to create a data store zone or use an existing data store zone prior to following these instructions.


  1. Go to your Control Panel and create two new data stores, one for each resulting math device (mpathc, mpathb). This will create an entry on the DB which is used with the resulting UUID to create the data store manually on the compute resource. 
  2. Log in to the compute resource. Create the data stores on the compute resource using the UUIDs. iSCSI/FC data stores use LVM. Create two physical volumes, one for each DS:

    pvcreate —metadatasize 50M /dev/mapper/mpathc
    pvcreate —metadatesize 50M /dev/mapper/mpathb 
    CODE


  3. Create volume groups using the UUIDs from the UI:

    vgcreate onapp-usxck5bmj0vqrg /dev/mapper/mpathc
    
    vgcreate onapp-m8fhwar2eenf04 /dev/mapper/mpathb 
    CODE


  4. Check whether both data stores are available within the compute resources: 

    onapp-test # pvscan
      PV /dev/mapper/mpathc   VG onapp-usxck5bmj0vqrg   lvm2 [4.00 TiB / 650.90 GiB free]
      PV /dev/mapper/mpathb   VG onapp-m8fhwar2eenf04   lvm2 [4.00 TiB / 679.90 GiB free]
      PV /dev/sda2            VG vg_onaappkvm1          lvm2 [277.97 GiB / 27.97 GiB free]
    
      Total: 3 [8.66 TiB] / in use: 4 [8.66 TiB] / in no VG: 0 [0   ]    
    CODE

     

  5. Go to your Control Panel and add the data stores to the compute resource zone.

    When creating a virtual server, a logical volume will be created on the chosen data store. This will be your VSdisk.