The real questions about Enhanced VMotion Compatibility

More and more blog posts I read, talk about Enhanced VMotion Compatibility (EVC) and what it does, but they fail to answer some specific questions I have. So I started to do some thorough investigation and try and answer the remaining questions myself. I’ll copy and paste a few pieces from several posts and show you what questions it raised with me.

Q: Will EVC work for all applications running inside a VM?

A: No. EVC and the “old-fashioned” CPU masks you can apply to a VM, only work for applications that behave well by asking the CPUID before performing a special CPU instruction.

VMware KB article 1005763: To facilitate this, CPUs with Intel VT FlexMigration and AMD-V Extended Migration modify the results reported by the the CPUID instruction. Because well-behaved applications always rely on the results of the CPUID instruction, only they can fully benefit from the EVC feature. Ill-behaved applications that use other methods of feature detection will not benefit from this feature and might become unstable or even fail on systems that use the EVC feature. Note that such behavior is possible on native hardware as well.

Q: How to enable EVC?

A: To enable EVC you have to first shut down all of your VMs in the cluster. EVC is then enabled in the properties of the ESX cluster from VirtualCenter just like HA and DRS.

Q: Is there a way to enable EVC without shutting down all your VMs?

A: Yes, there is, but deppending on the size of your environment it might or might not be right for you. First VMotion all VMs of a host in the cluster. Then moving the host into a newly created cluster. Enable EVC on this cluster and then VMotioned VMs from the old cluster to the new cluster, emptying yet another host which could then be moved to the new cluster. In this way, you can enable EVC without bringing VMs down.

Remark: Be sure to start with the “oldest” host, because this will make your baseline. At this time (nov-2008), there isn’t much difference in cpu families that will cause different baselines, but in the feature there will be.

There is a drawback to this method however, permissions for groups and DRS /HA rules you’ve created at the cluster level, are lost and there is no way to export and import them to the new cluster. You have to re-enter these rules by hand. So, depending on the number of rules you have created, this method might work for you.

Q: How can I add an older type host to a configured EVC cluster?

A: You can only do this by shutting down all VMs on that cluster.

VMware KB article 1005764: “The virtualization layer automatically creates and applies masks required to preserve CPU compatibility and ensure application stability.” and “These host CPUs need to meet certain criteria to enter a cluster, but it is not necessary that all CPUs be exactly the same.”

From this we learn that when you create a cluster, add some hosts to it and enable EVC, Virtual Center checks the hosts and applies a baseline feature set to all these hosts, based on the cpu with the lowest feature set. The EVC settings also clearly specify that “Once enabled, EVC will ensure that only hosts that are compatible with those in the cluster may be added to the cluster”. When playing with this, I thought, well maybe I can turn off EVC and add the host and then enable it again. Although I didn’t have an older host available to really test this, I wanted to find out what disabling EVC meant. Well, I learned the hard way: Don’t ever disable EVC on a production cluster, because you can’t enable it again for as long as your VMs are still running. Unlike HA and DRS, which you sometimes disable and re-enable again on your cluster, you should never do this with EVC, because you will have to power-off all your VMs. It would be great if VMware could add an extra warning box when trying to disable EVC.

Because all my test hardware is within the “AMD Second Generation Opteron” range, I couldn’t test the adding of an older host and above that, there is no AMD range that is older than this and meets the requirements for EVC (see: VMware KB 1003212) . But since all VMs have to be powered off, I think it is safe to say, that adding an older host, will lower the baseline. You just have to go through the whole procedure of powering down your VMs, disable EVC, add host, enable EVC and then power up your VMs again.