Don’t we all know the time you forgot to monitor your LUN available space? The call at 3am in the night from the monitoring-room that a LUN has no free space and VMs are crashing (VMs with active snapshots). So had we. We now have configured addequate monitoring to alert when less then 20Gb free space. But despite this it could still occure that a LUN runs out of space.
Problem is that when a LUN is full and you have to clean up snapshots, you also need extra space to remove them. This is not an easy task and certainly not for the average admin that has Windows servers, webservers, SQL servers and ESX servers just as part of the job and gets called in the middle of the night.
My collegue Arnim came up with this great idea to place a 5Gb dummy vmdk on each LUN. Now when a LUN runs out of space, you can delete the dummy vmdk, clean up the mess, free space and after that ofcourse don’t forget to recreat the dummy for the next time.
For the admin that gets called in the middle of the night, there now is an option to just delete the dummy vmdk and postpone solving the real problem till first thing the morning.
(We also will be monitoring on the absence of the dummy file!)
Love to hear your thoughts on this.
Surely snapshots are a temporary action which will be deleted after the reason for taking a snapshot has been completed. Hence you should never run out of space, but just incase….
There is a nice VI Toolkit script here that will drop you an email so you can track the amount of disk space… http://www.peetersonline.nl/index.php/vmware/track-datastore-free-space/
MS Exchange automatically does something similar. There’s 2 5 MB pad files in each transaction log dir on the Exchange server. If the DB engine determines that the disk is out of space it deletes the files, writes out the pages in memory waiting to go into the transaction log(s) and then unmounts the databases that are affected.
I wonder if VMWare could do something similar to bring down a system cleanly? Maybe suspend the VM?
We lay down an VMFS volume on each LUN, so each LUN is in-effect “full” when deployed. What I need is to monitor the free space on the VMFS volume…any tips? The only thing I have found are some scripts with scant documentation from the creator.
Great idea!
We’re using a VMFS check within Nagios. Works great! Also email or sms notifications are possible.
Interesting stuff, I wonder what would happen if you where deduping , would the fake file become useless?
I wonder if it would be possible to script a lun resize on the san level, with a set % increase.
Roger