Using vscsiStats (the full how-to)

After reading verbeieren’s blog about vscsiStats, I was very interested to see what this tool could bring me. So I started to play with it and found that I needed some extra hints to get me started. Enough for another blogpost I thought.

With vscsiStats you can monitor the IO through the virtual SCSI controller of a VM (or multiple VMs). vscsiStats works from the console and outputs to the console. The data can be exported as a CSV file so you can import it into Excel or any other tool. The way vscsiStats works is that you first enable statistics collection on one or more VMs and then take samples at severall moments. Never forget to stop the collection when you’re done, because it will cause some performance degradation.

The vscsiStats command can be found in /usr/lib/vmware/bin and is not part of the path environment, which means that you either first change to this directory or refere to the full path everytime you use the command: /usr/lib/vmware/bin/vscsiStats.

Like said in the first part, you first have to start the collection of data. If you’re troubleshooting a single VM, it’s best to collect stats for just one VM. Before you can do that, you will have to find the Worldgroup ID of the VM. Strange thing is that when using the ‘normal’ method for this ( vm-support -x) you will find slightly different WID’s then what you need for vscsiStats. Better is to first run ./vscsiStats -l. This will give you a list of all available virtual machines and their disks. In my example I want to check my home exchange server called w2k3-ex01-64bit:

./vscsiStats -l
Virtual Machine worldGroupID: 1076,
Virtual Machine Display Name: w2k3-ex01-64bit {
Virtual SCSI Disk handleID: 8195
Virtual SCSI Disk handleID: 8196
Virtual SCSI Disk handleID: 8197
}

1076 is the WID I need. Now to start collecting data for just this VM run:

./vscsiStats -s -w 1076

The result will be something like:

vscsiStats: Starting Vscsi stats collection for worldGroup 1076, handleID 8195
Success.
vscsiStats: Starting Vscsi stats collection for worldGroup 1076, handleID 8196
Success.
vscsiStats: Starting Vscsi stats collection for worldGroup 1076, handleID 8197
Success.

The option -s is to start collection, -w determines the WID of the VM you want to monitor. After stats are running for lets say 5 minutes, you can get a sample with:

./vscsiStats -w 1076 -p all -c

Again -w is used for the WID. Then with -p you can select the stats you would like to use. You can choose between: all, ioLength, seekDistance, outstandingIOs, latency, interarrival. When using the stats to put them in a nice histogram or spreadsheet, you can use -c to print the output as csv and ofcourse you could use ‘ > /tmp/outputfile.csv’ to redirect the output to file.

Now after you have collected these values for a few times and you’re done with collecting, don’t forget to stop the collection.

./vscsiStats -x
You will now see the following output:
vscsiStats: Stopping all Vscsi stats collection for worldGroup 1076, handleID 8195
Success.
vscsiStats: Stopping all Vscsi stats collection for worldGroup 1076, handleID 8196
Success.
vscsiStats: Stopping all Vscsi stats collection for worldGroup 1076, handleID 8197
Success.

The tricky part is how to make some sensible info from it. I try to use excel to convert the data into a nice histogram and would love to have a macro for this. But untill I have written one, I have to do it by hand. Keep in mind that these are histograms when looking at the following piece of data:

44, 4095
1405, 4096
10, 8191
837, 8192

These numbers tell you that for 44 times, there was IO command with a blocklength of 4095, 1405 times of 4096, etc, etc. Be sure to interpret them correctly. Having too many blocks of 4095 and 8191 could point to a misaligned partition or disk in this case.

Hope this guide helped you working with the data. Should anyone have a nice excel macro to convert the data into histograms, please mail me or post it in the comments.