This menu displays the result of past diagnostics tests. Below you will find the details on all possible results shown for the following resources:
|Degraded disks||Shows the list of VDisks in a degraded state, which means that one or more (but not all) replicas of a stripe are not fully synchronised. Degraded VDisks are listed with the OnApp vd_uuid and a repair option.||Use the repair all option to queue the repairs. Repair will resynchonise the content from an elected master to a slave. The repair button starts a repair task that will take some time depending on the data store, network and disk drive configuration|
|Disks with partial memberlist||Shows the list of VDisks having an incomplete membership list, due to disk failure, network failure or otherwise. Each VDisk should have (S)tripe * (R)eplica members.||Use the repair operation to repair the membership. This will elect a new valid member from the suitable nodes in the data store. Once the membership is repaired, the VDisk will be in a degraded state until it is re-synced.|
|Stripes with no replica||Shows the list of VDisks which have lost all replicas for a stripe. There is no redundancy at this point for this stripe and the data is lost. If a VDisk is in this category then the associated VS is likely broken unless the VDisk is a swap drive.||No repair action available.|
|Disks with no redundancy found||One or more VDisks have not got a replica stripe member on another Compute resource. VDisk is healthy but all replicas of a stripe are on the same Compute resource.||Use a Rebalance link in the Actions column that leads to re-balance page for a VDisk. This will allow the content of a VDisk to be rebalanced to another suitable disk drive.|
|Partially online Disks found|
The list of VDisks that have at least one stripe online and at least one stripe offline. There must be an authoritative member for each stripe.
Use a Repair link in the Action column that will issue a special Storage API call (online refresh action) to fix this problem. Status of the VDisk before will show offline but one or more members will show an online front end.
|Degraded snapshots||The list of VDisk snapshots in degraded states (except ones currently being used for ongoing backups). Backups cannot be made from a degraded snapshot.||To resolve this, use a bulk Delete All link in Action column that will create a background task. This task unmounts, performs unkpartx, makes zombie snapshots offline on each Compute resource from the zone, and then removes the snapshot. The task may leave some snapshot VDisks left, so check for unremoved VDisks upon task completion.|
|Zombie disks found||The list of VDisks that are not associated with a VS have been found. These may include VDisks created by the command line and VDisks created for benchmarks.||To resolve, use a bulk Delete All link in Action column that will create a background task. This task unmounts, performs unkpartx, makes zombie disks offline on each Compute resource from the zone, and then removes the disk. The task may leave some zombie disks left, so check for unremoved disks upon task completion.|
|Disks in other degraded states||The list of VDisks that are degraded but not in any of the other states above. These can be the disks that have missing partial members, missing inactive members, missing active members, or missing unknown members.||No repair action available|
|Partial node found||The Compute resource hosting the node is reachable, and reports over the API that the node is running. Possibly storageAPI and groupmon are not responding on the storage controller server.||To fix, perform a controller restart. Make sure that there is sufficient redundancy such that restarting controllers on one Compute resource will not cause VS downtime.|
|Inactive nodes found||Either the Compute resource hosting the node is not reachable, or it is and is reporting that the storage controller for the node is not running.||Either power-cycle the Compute resource, or bring up the storage controller VS. This can be a bit tricky if there are more than one storage controllers running on the same Compute resource, and only one has shutdown.|
|Nodes with delayed ping found||Node reachable over the storage API, but is not sending out pings. Groupmon service is not responding on the node.||To fix this problem, restart the groupmon service from inside the storage controller server, that can be triggered from the UI.|
|Nodes with high utilization found||The list of nodes with disk utilization over 90%.||To improve, click the Rebalance link in Action column leading to list of disks located on the node, so that user can rebalance them away from it.|
|Out of space nodes found||Node utilisation is reported at 100% for one or more nodes.|
Te Repair action will forget the content of one of the VDisks that is Compute resource redundant and in sync.
|Missing drives found|
The Compute resource configuration has a drive selected that is not being reported to Integrated Storage.
No repair action available. Compute resource configuration edit page can be selected from the error reported to deselect the drive if appropriate.
|Extra Drives||The drives that are disk-hotplugged into the system.||No repair action available from UI.|
|Inactive controllers||The list of controllers that cannot be reached but the host Compute resource is responding.||Restart the controller.|
|Unreferenced NBDs found|
The list of NBD data paths that are active but not referenced by a device mapper.
To fix, schedule a CP transaction which will try to clean up the unreferenced NBDs by disconnecting from the frontend. Delete all.
|Reused NBDs found|
The list of multiple uses of the same NBD connection.
No repair action available from UI.
Dangling device mappers found
The list of device mappers that are not in use.
To fix, look for the corresponding VS and if the VS is booted do nothing but otherwise try to unmount and offline the VDisk.
Note that starting with ATA/ATAPI-4, revision 4, the meaning of these Attribute fields has been made entirely vendor-specific. However most newer ATA/SATA disks seem to respect their meaning, so the option of printing the Attribute values is retained.
Solid-state drives use different meanings for some of the attributes. In this case the attribute name printed by smartctl is incorrect unless the drive is already in the smartmontools drive database.
Since this is vendor specific, not all drives support SMART. Nonetheless most do, providing the SMART reporting is enabled in the BIOS and that the hardware supports SMART.
If the drives are behind a RAID or another controller, the controller must also support the SMART's passthrough for SMART to work. Specific BIOS and firmware upgrades may enable SMART support, however it remains very much hardware and configuration dependent.
SMART errors found
For one or more Disk drives in the Compute resource, SMART inbuilt tests have reported one or more warnings. SMART errors occur when the drive has surpassed the threshold for reporting a failure.
Replace the drives in the maintenance window that appears.
|SMART warnings found|
SMART warnings occur when the failure attributes exist but are not at the threshold level - either Pre-failure or Old age.
Old age, or usage Attributes, are ones which indicate end-of-product life from old-age or normal aging and wear-out, if the Attribute value is less than or equal to the threshold.
|Please note: the fact that an Attribute is of type ’Pre-fail’ does not mean that your disk is about to fail! It only has this meaning if the Attribute´s current Normalized value is less than or equal to the threshold v|