Introduction to vSphere HA

Learner Objectives

After completing this Introduction to vSphere HA lesson lesson, you should be able to meet the following objectives:

Identify options for configuring a highly available vSphere environment
Describe how vSphere HA responds when an ESXi host, a virtual machine, or an application fails

Protection at Every Level

With vSphere, you can reduce planned downtime, prevent unplanned downtime, and recover rapidly from outages.

About vSphere HA

vSphere HA provides rapid recovery from outages and cost-effective high availability for applications running in VMs. vSphere HA protects application availability in several ways.

Protects Against	How Does vSphere HA Provide Protection?
ESXi host failure	By restarting the VMs on other hosts within the cluster
VM failure	By restarting the VM when a VMware Tools heartbeat is not received within a set time
Application failure	By restarting the VM when an application heartbeat is not received within a set time
Datastore accessibility failure	By restarting the affected VMs on other hosts that still can access the datastores.
Network isolation	By restarting VMs if their host becomes isolated on the management or vSAN network. This protection is provided even if the network becomes partitioned.

vSphere HA Scenario: ESXi Host Failure

When a host fails, vSphere HA restarts the impacted VMs on other hosts in the cluster. Video 12103

vSphere HA Scenario: Guest Operating System Failure

When a VM stops sending heartbeats or the VM process (vmx) fails unexpectedly, vSphere HA resets the VM. Video 12108

vSphere HA Scenario: Application Failure

When an application fails, vSphere HA restarts the impacted VM on the same host. Video 12113

vSphere HA Scenario: Datastore Accessibility Failures

If VM Component Protection (VMCP) is enabled, vSphere HA can detect datastore accessibility failures and provide automated recovery for affected VMs. You can determine the response that vSphere HA makes to such a failure, ranging from the creation of event alarms to VM restarts on other hosts:

All paths down (APD):
- Recoverable.
- Represents a transient or unknown accessibility loss.
- Response can be either Issue events, Power off and restart VMs – Conservative restart policy, or Power off and restart VMs – Aggressive restart policy.
Permanent device loss (PDL):
- Unrecoverable loss of accessibility.
- Occurs when a storage device reports that the datastore is no longer accessible by the host.
- Response can be either Issue events or Power off and restart VMs.

vSphere HA Scenario: Protecting VMs Against Network Isolation vSphere HA restarts VMs if their host becomes isolated on the management or vSAN network.

Host network isolation occurs when a host is still running, but it can no longer observe traffic from vSphere HA agents on the management network:

The host tries to ping the isolation addresses. An isolation address is an IP address or FQDN that can be manually specified (the default is the host’s default gateway).
If pinging fails, the host declares that it is isolated from the network.
This protection is provided even if the network becomes partitioned.

Importance of Redundant Heartbeat Networks

Redundant heartbeat networks ensure reliable failure detection and minimize the chance of host isolation scenarios. In a vSphere HA cluster, heartbeats have the following characteristics:

They are sent between the master host and the subordinate hosts.
They are used to determine whether a master host or a subordinate host has failed.
They are sent over a heartbeat network.

Redundancy Using NIC Teaming

A heartbeat network is implemented in the following ways:

By using a VMkernel port that is marked for management
By using a VMkernel port that is marked for vSAN traffic when vSAN is in use You can use NIC teaming to create a redundant heartbeat network on ESXi hosts.

Redundancy Using Additional Networks

You can create redundancy by configuring more heartbeat networks.

On each ESXi host, create a second VMkernel port on a separate virtual switch with its own physical adapter.

Redundant management networking supports the reliable detection of failures and prevents isolation or partition conditions from occurring, because heartbeats can be sent over multiple networks.

Review of Learner Objectives

After completing this Introduction to vSphere HA lesson, you should be able to meet the following objectives:

Identify options for configuring a highly available vSphere environment
Describe how vSphere HA responds when an ESXi host, a virtual machine, or an application fails

Join the conversation