Windows Failover Cluster monitoring

Clusters can be an extremely powerful tool for IT administrators to ensure application availability and performance. However, if left unmonitored, a well-performing cluster can easily obscure any hardware or software failures that degrade the cluster’s redundancy. Accurate monitoring of cluster health is therefore crucial in maintaining added value.

Uptrends Infra now provides powerful tools to keep track of your Windows Failover Cluster, allowing you to preemptively fix problems before your users notice anything is wrong. We’ll show you how to get started.

Agent setup

Setup

In order to perform accurate measurements on the cluster as a whole, we recommend using an Agent on a server that is outside the cluster. Follow the normal procedure for agent installation.

Two additional steps are then required before you can start creating sensors:

  1. Create a set of credentials on the Agent with administrative privileges on all of the cluster’s nodes.
    Specifically, the user account you specify needs to be able to remotely perform WMI queries on the Root\MSCluster namespace. By default, members of the Administrators group on any Windows Server have all required permissions. If you wish to use a non-Administrative account, please see Appendix A for a list of useful web resources.
  2. Create a new device on the Agent with the name of your cluster. In the Address field, enter a network name or IP address that corresponds to a Network Name or IP Address Resource for the cluster that you wish to monitor.
    If you desire, you can create a new IP Address Resource for this purpose. Be sure that the IP Address resource is in the same Resource Group as the applications that you wish to monitor, so that the IP address is always handed over to the active server.

Once this is done, you can use the newly created Device to configure sensors that apply to the cluster as a whole (rather than one particular server).

Individual node measurements

Depending on your needs, you may also want to measure the performance of all nodes in your cluster individually. There are no special considerations; just add devices on a central agent or install local agents on each node as desired (using the normal procedure).

Number of available nodes

When your cluster works well, you might not notice when one or more nodes fail – until it’s too late. By constantly measuring how many servers are available at any given time, you can rest assured that your cluster can deal with sudden failures of hardware or software.

Setup

  1. Edit the cluster device (as created before), and go to the Sensors tab.
  2. Click the Add Sensor button.
  3. In the Add Sensor window, check the box next to Show advanced sensor types.
  4. From the list of available sensor types, select Cluster: Available Nodes.

  5. Click Add Sensor.
  6. As with any other sensor, configure the basic settings such as Short Code and sampling/storage frequencies.
  7. Click the Sensor properties tab.
  8. Enter your desired error and warning thresholds.
  9. Click Save to create the new sensor. You should see the first measurements on the Cluster Device’s Detail Dashboard soon.

Currently active node

It’s often useful to detect which node is currently hosting the services in a given resource group. In the future, you will be able to receive alerts whenever this value changes (i.e. when a manual or automatic failover occurs).

Setup

  1. Edit the cluster device (as created before), and go to the Sensors tab.
  2. Click the Add Sensor button.
  3. In the Add Sensor window, check the box next to Show advanced sensor types.
  4. From the list of available sensor types, select Cluster: Active node.

  5. Click Add Sensor.
  6. As with any other sensor, configure the basic settings such as Short Code and sampling/storage frequencies.
  7. Click the Sensor properties tab.
  8. Click Load Resource Groups to load the available Resource Groups from the cluster’s configuration.

  9. Select (or type) the name of the Resource Group for which to determine the active node.
  10. Click Save to create the new sensor. You should see the first measurements on the Cluster Device’s Detail Dashboard soon.

Note that it is currently not possible to receive alerts when this sensor type detects a node change (failover). This functionality will become available in a future release.

Other measurements

In addition to the clustering-specific sensor types, you can add most of the other available sensors to the Cluster Device. Such sensors will then automatically perform their measurements on whichever server is currently hosting the configured IP address for the device. If the IP Address Resource is correctly defined in the same Resource Group as your applications and services, the measurements are always taken on the node that currently owns that group.

Configuring Windows Servers for remote WMI querying

The following Microsoft articles should help you configure your clustered servers to accept incoming WMI queries from the Uptrends Infra Agent for monitoring purposes.