Posts Tagged ‘Vmware’

2014-06-17_16h26_59[1]Recently during a Veeam backup HA kicked in due to a problem with VCenter and marked two machines as Invalid

  •  Veeam started backup and created snapshot on affected VMs
  • HA kicked in after VCenter Server failure
  • Vcenter tried to move the machines after the HA event but they were locked by the snapshot Veeam had (the hypervisor has a lock because of the snapshot)
  • Machine couldn’t move but Vmware removed them from inventory and tried to register them to another host, without success
  • Machines marked as Invalid, Removing machines from Inventory and registering the machines manually did not work
  • The machines are locked however still powered on and response

Fix to get VM’s registered on VCenter

  • Stop any tasks (i.e. Veeam) that might trigger a snapshot on affected machines
  • Temporarily disable DRS in the cluster (when we power up the machine we don’t want Vsphere to power it up on another host otherwise the lock will persists)
  • Power affected VMs off gracefully from within the OS.
  • Login to vcenter and select Host which failed and . Unregister “unkown” VM’s listed (these should be the actual Vms). Unregistering should remove the lock.
  • Register the virtual machines using the vmx files in the datastore
  • Check no snapshots are present, if yes delete or consolidate
  • Power on Vm
  • Re-enable DRS
VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

SRMError – Error creating test bubble image from group instance ‘RGID-xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx’ of group ‘GID-xxxxxx-xxxx-xxxx-xxxx-xxxxxx’ on VR Server ‘localhost.localdom’ (address ‘xxx.xxx.xxx.xxx’). VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: ‘java.lang.NullPointerException’

This happens due to problem RGID which needs to be recreated by the following method for each VMware Machine:

1) Rename the VM Datastore folder for the problem VM on the remote side to something else like vmname.old

2) Wait for the machine sync Replication Status to come up as not active in the vSphere Replication ( You can use Pause Replication then Resume to force this)

2) Stop and remove the replication

3) Rename the VM folder back to the previous name

4) Reconfigure Replication and use the existing data as a seed

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

vmware_vcenter_site_recovery_manager_diagramWhen trying to test a Recovery Plan ( DR Bubble ) upon the starting up two of the machines we get the following error show up in the Errors

Error – Unable to copy the configuration file ‘MachineName.vmx’ from the host to ‘C:\Windows\TEMP\vmware-SYSTEM\machinename.vmx324-0’  – No file exists for given path.

We check the replication status manually as the gui is sometimes out of date by SSHing  into the host with the virtual machine and using:

vim-cmd vmsvc/getallvms | grep –i vmname

This will give you the ID of the machine where you can then do this

vim-cmd hbrsvc/vmreplica.getState vmnameid

The status of the machine was just IDLE instead of Inactive which is incorrect.

Found using this command in /var/log/

grep -i hbr vmkernel.log | grep ” LWD delta transfer terminated (aborted)” | awk ‘{print $10}’ | sort -u

Brought up a load of diskid’s that had issues one of them was our VM. In the end we had to remove replication and reseed again.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

DH_2D00_073113_2D00_1[1]Recently we were having issues with the time period of our Veam backups of SQL database. Veam creates a snapshot of a server, which can then be copied to disk or to tape subsiquently. The SQL servers were having their Indexes fully rebuilt every night.

“If you’re using the FULL recovery model, the entire index rebuild operation is fully logged, which means the transaction log file must be at least as large as the index being rebuilt. It also means the next
transaction log backup will essentially contain the entire index.” ( Per http://sqlmag.com/blog/it-bad-idea-rebuild-all-indexes-every-night  )

This balloned the storage needed on the SAN nightly by more than 1TB due to the writes of the transaction log and the changes from the inital snapshot and also slowed down the overall backup process

A smarter way to index the servers each night is to analyse the fragmented indexes , and only reindex thoose

This can be found here : http://blogs.technet.com/b/sql_server_isv/archive/2010/10/18/index-fragmentation-if-it-isn-t-broke-don-t-fix-it.aspx

Indexing tasks and scipts can be found here : http://technet.microsoft.com/en-us/library/ms189858.aspx

 

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)