I frequently read postings on different forums about what the best LUN size is and I thought I’d write a post about it on how I usually do it.
First of all, there is no ideal LUN size that suites all environments. But I think there is a general calculation you can use to come to the LUN size suited for your environment. And second, there is no technical difference between small or large LUNs. For the read/write performance it doesn’t matter if your VMDK is on a 100Gb LUN or 2TB LUN, it’s the total load on the LUN that matters.
The bases of my rule of thumb is that a LUN should hold no more then 30 VMDKs, more VMDKs could impact performance because of disk queuing. Keeping those 30 VMDKs in mind, you should then check what your average disk size is and please look at it by using your common sense. With that I mean, if you have 100 VMs that use around 15Gb each and 2 VMs with 1,5Tb, please leave those 2 out of your calculation. These exceptions you can handle later on.
Ok, you have determined the average disk size, let’s say 12Gb. Now multiply this by that magic 30 and you see that you would need 360Gb per LUN to accommodate 30 VMDKs. But that’s not all; we also need some spare room for VM swap space and for snapshots. Normally the VMs I use have 1Gb or 2Gb RAM assigned, which would give a 1-2Gb swap space for the VM. Because most VMs have more than one VMDK, I think its safe to state that 30Gb swap space per LUN is sufficient.
Most tricky part is the spare room you should reserve for snapshots. I try to keep snapshots active as short as possible. A week old snapshot is really long in my opinion. Of course, there can be reasons that you need it, but normal operations would not require running a snapshot for such a long time. So how big should the room for snapshot be? Let’s put it at 15%.
So to summarize and build the formula:
30 x (your average disk size) + 30Gb VM swap + 15% of (30 x your average disk size) = calculated LUN size.
And to put the cherry on top, take your calculated LUN size and round it up to the next “handy†number. For example, 444Gb I would round up to 500Gb, 689Gb I would round up to 750Gb and so.
With this formula I think you can make a basic calculation and get an idea of what COULD suite your environment. Its no hard rule, it something you have to feel comfortable with and maybe after some time you would tweak this to your own experiences.
Would love to hear your thoughts and comments on this!!!
If you are using Disk Queuing as the metric I am not sure I buy into what your argument of only 30 VMs. On most Storage Arrays the target ports can queue 1000+ IOs lets just say 1000. If you set your ESX server Queue Depth on the HBA to 255 (max) that would mean each ESX server can Queue up to 255 IOs for a given target/lun so 4 ESX servers with a queue of 255 each will overload the storage port in this example. If we look at it from VM perspective and we limit ourselves to 30 VMs per Datastore we are given each VM in this example a queue of 33.3 IOs which I think it a lot since the default Queue Depth set on a phsical windows server in windows world is 16, therefore I am going to say 20 Queued IOs per VM should be plenty meaning 50 VMs per datastore. Now the real question I think is what is the VMFS Overhead here becaue that is where I could see a problem. VMFS needs to maintain locking, reservations, and meta data (SCSI 2) and that is overhead we just don’t have with NFS.
This is where I see NFS really having potential and completely outperforming FC because we completely bypass VMFS and locking/SCSI reservations are no longer a concern…what do you all think? Does anyone else have recomendations for how many VMS to put into a Datastore, anyone also know what VMware says about this topic?
Regards,
Keith
One thing to point out is that the number of VMs per VMFS volume doesn’t matter in and of itself, it’s the number of *active* VMs.
Gabe, you forgot to leave any room on the LUN for free space. 20% is the recommended amount with 10% a bare minimum.
So your forumla should read more like this:
30 x (average disk size) + 30 x (average memory allocation) + 15% of (average disk size total) + 20% (as free space) = calculated LUN size.
Keith, my understanding was that the default queue depth for most FC HBAs (QLogic/Emulex) drivers in ESX is 64 (although some are 128) and VMware support usually doesn’t want you playing with this
without good reason.
As for what VMware has to say on the subject, 50 active VMs per LUN is the max suggested and 12 VMs is what VMware consuling used to say – but I think a lot of that dated back to ESX 2.x.
20-25 VMs per LUN is what I usually do.
Also as remote SAN replication is becoming more and more common in VMware environments I have a few other thoughts. Assuming we are talking about Windows VMs only here for a sec (since most everyone understands those) and that they only require 1 VMDK (C:). I would create 2 “sister” LUNs.
LUNA = VMDK “C:” drives for each of the virtual machines
LUNB = VSWP & VMDK “D:” drives for each virtual machine (D: contains the Windows pagefile and anything else we don’t need to replicate)
Then I would only have to do SAN replication on LUNA and could completely ignore LUNB for daily replication (although you would need to get the D: VMDKs created on the other end initially – this could be manual).
If you think about it, if you have 30 active VMs on those two LUNs (using your numbers from before) with an average of 12GB VMDKs and 1GB RAM you have 75GB of useless data to replicate (1GB vswp and 1.5GB pagefile VMDK – assuming 1.5x RAM times 30 VMs) over the WAN. This is also probably the data that will be changing most often (at least for the pagefiles). Why replicate it if you don’t have to…
Thoughts?
One thing I didn’t make clear:
…you have “up to” 75GB of useless data to replicate…
I understand most of these blocks will be empty and therefore ignored, but on long running VMs the pagefile will still build up substantially – I would hope the VSWP stays empty (or you have bigger immediate problems) so lets say an average of 20-30GB of data for our 30 VMS… still not insignificant – especially when you think about a company with a couple thousand VMs trying to replicate overseas.
Indeed. Lun sizing is a discussion that has been around for quite a while now.
I would recommend using the metrics as decribed above.
But I think another aspect needs to be considered.
Together with virtualization, we want to consolidate as much as possible without losing performance. In case of consolidation on LUN’s one might consider the number of VM’s that stop functioning when LUN data corruption occurs.
Last time we had a client with VM’s on 1TB LUN’s. One of those LUN’s appeared to be corrupted. There we were lucky we could still migrate the lot to another LUN. Therefore I would recommend to not consolidate TOO much VM’s onto a single datastore.
Should we therefore create lun’s per VM? No absolutely not. But we should consider downtime, data loss and recovery in case of LUN corruption.
I do my sizing for my implementations around 300-500GB per LUN. But that’s my experience :)
Semi-related
http://blogs.vmware.com/performance/2008/02/scalable-storag.html
My experience with lun sizing depends upon your queue depth at the SP and at the FC. If you have set the queue depth to low and you are trying to oversubscribe this you will have potential performance problem. We uses standard lun size of 400GB and just ensure that we do not oversubscribe the queue depth and we have more than 30+ VMDK files on it. So as you mention earlier there is no thumb rule for lun size to determine and I have to agree with that.
In my experience, you run into cluster filesystem locking problems long before you run out of IOPs on a good storage device.
I keep the lock contention at bay by limiting the number of VMs per VMFS to 25-30. With 30GB VMs, this works out to 750 – 900 GB per LUN.
Our onsite VMware TAM and our EMC storage reps actually recommend less, for busy VMs: 450 – 500 GB per LUN, with one VMFS per target.
Our VMs are never all active at the the same time, so we can get away with a higher density.
I can’t see how IOPS wouldn’t be a major issue with this, unless you’re running RAID groups with many spindles to give high IOPS capacity.
As a worst case if you did a 3 x 1TB disk RAID-5 RAID group and ran 30 VMs on it you’ve got 30 servers contending for about 400 IOPS. Of course SAN cache performance etc alleviate this but surely there’d still be a major bottleneck.
Gabe,
I have been using 1 lun per 1 datastore and everything seems to be running great. Do you see any real problems with doing this. Also, what have you seen in array size on the SAN. Do you see anything wrong with creating very large arrays and storing very small LUNs on each array. Are there any best practices you care to mention?
Thank you
size ki samajh nai aye