Over the last few years server virtualization within enterprise network environments has been quickly gaining popularity. Harnessing the power of virtualization creates truly dynamic datacenters which can effectively respond to an organization’s needs. In response to the desire for greater flexibility and agility, Microsoft has added Live Migration to the R2 release of Windows Server 2008. Live Migration essentially allows an administrator to transfer a running virtual machine from one physical host to another physical host with no perceived downtime.
In order to take advantage of the Live Migration feature, there are a few prerequisites. First off, your organization needs to implement some form of shared storage, i.e. an iSCSI or Fibre Channel SAN, in order to store your virtual machines files. This is important because shared storage facilitates the ability for Live Migration to only transfer the memory state and ownership of the target VM, as opposed to an entire VHD file.
The next step is to configure the Failover Clustering feature. Failover Clustering can be configured at the host or application level. A Hyper-V host level failover cluster means that the entire VMs themselves are made highly available, as opposed to just the application they host. Guest OS failover clusters between VMs are used to maintain the high availability of applications within the VM, like SQL or Exchange for example. In our scenario, we are going for a host level cluster because we want to transfer the entire VM, not just a single service.
The way Live Migration actually works is pretty interesting. Once your VMs are configured to be highly available, Live Migration can be initiated from the Failover Cluster Manager. Once invoked, the memory pages of the target VM will begin to be copied and transferred from the source host to the destination host. However, one significant flaw in this process is that as the memory pages are being copied, the VM is still running, and thus, still modifying its own memory state. To combat this issue, all changes to the memory state are tracked during the migration process and memory pages that have been modified are categorized as “dirty pages.” Therefore, copying the VM memory state must become an iterative process, which is exactly the case. The logic behind this is that through each iteration, the amount of dirty pages which must be copied from the target will continue to decrease, eventually reaching a point where the entire working memory state of the target VM is located on the destination host.
Now, here is where things get really clever. During the iterations, the hosts are constantly computing the amount of remaining dirty pages left on the source. They also remain cognizant of the negotiated TCP timeout interval between each other, and other network traffic. Once they know the amount of remaining dirty pages is small enough to be transferred to the destination host under the TCP timeout interval, several actions are performed:
1) The target VM is paused
2) The remaining dirty pages are transferred to the destination host
3) Ownership of the VM (on the SAN) is transferred from the source to the destination host
4) ARP packets update the switching tables
5) The VM is un-paused on the destination host
All of this happens so quickly that any services being accessed over TCP will not even notice the transfer. And even if they do, all that would be required is the re-transmission of maybe a single packet. Pretty cool, huh?
So, what benefit does this provide? Well first off, planned downtime of a physical host can be a thing of the past. Need to add more RAM, swap out a processor, patch and reboot the host? Not a problem, simply Live Migrate your VMs to another host and do what you got to do. Your users will not even notice a hiccup. An even more powerful benefit Live Migration provides is the ability to dynamically transfer virtual machines to different hosts, sites or environments based off of service demand, or even imminent failure. The ability for an administrator to proactively respond to significant network events is extremely critical, therefore, System Center Operations Manager has a feature called PRO (Performance and Resource Optimization), which will integrate with System Center Virtual Machine Manager to automate the entire dynamic re-provisioning process. Moreover, SCVMM also utilizes intelligent placement algorithms which can find the best candidate to transfer a workload to, based off of past and projected metrics.
We have been hearing the term “Dynamic Data Center” for quite some time now and capabilities such as Live Migration truly help bring that notion into reality. I’m sure any administrator would love to have this feature at their disposal.