HA unsuccessfully failed over VM on HOST in cluster vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: Operation timed out.

2014-06-17_16h26_59[1]Recently during a Veeam backup HA kicked in due to a problem with VCenter and marked two machines as Invalid

  •  Veeam started backup and created snapshot on affected VMs
  • HA kicked in after VCenter Server failure
  • Vcenter tried to move the machines after the HA event but they were locked by the snapshot Veeam had (the hypervisor has a lock because of the snapshot)
  • Machine couldn’t move but Vmware removed them from inventory and tried to register them to another host, without success
  • Machines marked as Invalid, Removing machines from Inventory and registering the machines manually did not work
  • The machines are locked however still powered on and response

Fix to get VM’s registered on VCenter

  • Stop any tasks (i.e. Veeam) that might trigger a snapshot on affected machines
  • Temporarily disable DRS in the cluster (when we power up the machine we don’t want Vsphere to power it up on another host otherwise the lock will persists)
  • Power affected VMs off gracefully from within the OS.
  • Login to vcenter and select Host which failed and . Unregister “unkown” VM’s listed (these should be the actual Vms). Unregistering should remove the lock.
  • Register the virtual machines using the vmx files in the datastore
  • Check no snapshots are present, if yes delete or consolidate
  • Power on Vm
  • Re-enable DRS
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...