Integrated Storage Data Stores

Integrated storage data store functionality allows combining physical disks from compute resources into a virtual data store to create a distributed block based storage system as an alternative to SAN systems. You can remove disks from a server, add them to any other server and run anywhere in the system without impacting operation of your SAN. The disks in the SAN are grouped by performance.

Integrated storage data store is separated into storage channels (API Endpoints)  that correspond to compute zones. This allows managing storage within a zone. The cross zone storage transfers are currently not supported.

It is possible to keep all stripes and replicas on a single compute resource, but this will reduce the efficiency of a distributed storage system.

The number of compute resources used for creating integrated storage data stores must match the number of chosen replicas. Each compute resource should have the stripe number of disk drives (1, if there are no stripes).

To be able to rebalance or migrate the data, two HVs must be used with a configuration of at least two replicas.

For example if we have a system where two compute resources are used with four hard disk drives spread between the two, sized 100GB, 200GB, 300GB, 400GB. The 100GB and 200GB drives are in HV1 and the 300GB and 400GB drives in HV2 as shown in Figure 1.

If we have a data store with 2 replicas and 2 stripes with no overcommit, we could create at the most a 200GB VS if all the drives are empty. This is because the 200GB is split into two stripes for each replica each sized 100GB. We would then have the case where the first drive is fully occupied and all the rest have 100GB of occupied storage as shown in Figure 2. We would not be able to create any other VSs for this data store. This demonstrates that it is desirable to have HDDs that are roughly equivalent in size. As long as there are replica*stripe drives with free space, we can create a VS. The size of the VS we can create will be stripe*(the smallest free size available on all of the stripes).

How many virtual servers can reside on the integrated storage data store?

Use the following formula to calculate the number of virtual servers that can reside on the data store:

(Storage node memory size - 128) ÷ 4

Where:

Storage node memory size - integrated storage node's memory size in MB

128 -  amount of system memory reserved for the storage controller

4 - the amount of memory required for NBD connection in MB

After that, divide the deduced numeral by the number of paths required per disk:

if the data store has 2 replicas and 2 stripes, it requires 4 paths per disk. Linux virtual servers have 2 disks, so 8 paths are required  (if using the same data store configuration for main disk and swap).

For example:

The storage node memory = 1024 MB (default value), then:

(1024 -128) = 896 MB for NBD device paths

896 ÷ 4 = 224.

Depending on the data store disk configuration, this number determines the maximum number of VDisks that can be created.

Then, if the data store has 2 replicas and 2 stripes, it requires 4 paths per disk. Linux virtual servers have 2 disks, so 8 paths are required  (if using the same data store configuration for main disk and swap).

Then, the following number of virtual servers can be hosted on that data store:

  • 224 / 8 = 28 Linux virtual servers

  • 224 / 4 = 56 Windows virtual servers (with 1 primary drive)

To be able to get more virtual servers in the cloud, we recommend using a lower config for swap drives.