How too many vCPUs can negatively affect performance

Customer with small vSphere environment of just two hosts had performance issues and they asked me to investigate the situation. When looking at the technical specs at first glance, you would suspect that this configuration should work. With just two hosts, each dual Quad core CPU, the enivornment had a total of 16 CPU cores. In total there were only 9 VMs running using a total of 23 vCPUs. Usually you can easily run 5 vCPUs per core, if not more.

Turned out there were a number of VMs with 4 vCPUs and 2 vCPUs that really didn’t need them. You can easily discover this by checking READY en  CO-STOP value in the vCenter performance charts. Go to the performance tab of your vSphere host and select advanced. Then select “Chart options” go to the CPU section, choose the “realtime” section and then on the right deselect all counters and only select “Ready” and “Co-Stop”. Click OK.

READY: The time a virtual machine must wait in a ready-to-run state before it can be scheduled on a CPU
CO-STOP: Amount of time a SMP virtual machine was ready to run, but incurred delay due to co-vCPU scheduling contention.

I made a list of all VMs, their vCPUs and checked per VM what their real CPU usage was over the past weeks. None of the VMs had excessive CPU usage and I noticed that downsizing the vCPUs wouldn’t be a problem. Unfortunately, this had to be done outside office hours. The vCPU assignment when I started investigating, remember each ESXi host has only 8 cores:

Current situation esx01 esx02
VMs CPU CPU
Mmgt 1
Citrix 1 4
vCenter 2
Application 4
Develop 1
Exchange 2
DomainController 1
SQL 4
Citrix 2 4
Total 12 11

So I had two big Citrix VMs that where in use, but not heavily used, with each 4 vCPUs. An application server with 4 vCPU and a SQL server with 4 vCPU. The application server and SQL server had peaked to a max of 1 GHz in the past weeks. That is ONE GHz for a 4 vCPU VM. A bit overdone don’t you think?

Since I had to wait untill evening before I could down size, I decided to go for a small quick win and shuffle the VMs around, to create the following situation:

After VMotion esx01 esx02
VMs CPU CPU
Mmgt 1
Citrix 1 4
vCenter 2
Application 4
Develop 1
Exchange 2
DomainController 1
SQL 4
Citrix 2 4
Total 8 15

I put the two Citrix VMs together on one host, which would stop the ‘fight’ for free CPU cores. The second host now did get a bit more vCPUs to handle, but since the two big VMs didn’t need that much CPU time I figured it would be enough capacity on the host to at least make it to the evening without issues.

As the following image shows, the READY and CO-STOP values before and after the VMotion at 12:45PM. You can clearly see a big drop in READY and CO-STOP.

Later in the evening I would change the number of vCPUs for the vCenter Appliance VM from 2 to 1, the Application VM from 4 to 2 and the SQL VM from 4 to 2 vCPUs.

After vCPU change esx01 esx02
VMs CPU CPU
Mmgt 1
Citrix 1 4
vCenter 1
Application 2
Develop 1
Exchange 2
DomainController 1
SQL 2
Citrix 2 4
Total 8 10

As the following image shows, you can again see a drop in READY and CO-STOP values. At 9:20pm both SQL and the Application server were shutdown and started again with 2 vCPUs. At 9:55pm vCenter was robbed from its 2nd vCPU.

Conclusion: Think twice before you give VMs extra vCPUs which they don’t really need. You can negatively impact the performance of your environment since the vmkernel has to try and find a time slot in which it can give all vCPUs access to the physical cores.