Virtual Desktop Infrastructure (VDI) is becoming a hot topic in the wake of server virtualisation. It was one of the areas that generated the most attention at the recent IPExpo 2010 in October and certainly looks to gain momentum in the following few years as companies complete their server virtualisation programmes and look to transfer that knowledge to the desktop. However, assuming that server virtualisation knowledge is enough to then go and virtualise the PCs in your environment is where you may well come unstuck. There is a killer named IOPS lurking in the shadows, which you ignore at your peril!
Firstly we must understand the problem when tackling the problem of desktop virtualisation.
The intial startup and login
A Windows PC running on a local hard drive generally uses a IDE or SATA drive running at 5400 or 7200 rpm. Typically these disks can deliver 40 or 50 IOPS (Input Outputs Per Second), which is enough for a single user. When a PC starts it loads the basic OS and a number of services. Many of these services that exist specifically to optimise a physical disk environment, such as indexing and prefetching, and produce a lot of IOPS that may benefit a single disk system but when combined with many other virtual desktops can cripple a virtual desktop system.
The amount of IOPS a Windows client produces varies according to the applications loaded and how heavy the use is, but generally 8 to 10 IOPS for a the average user and 14 to 20 IOPS for heavy users is not too far off the mark. However, what is not generally accounted for is that the majority of these IOPS are WRITES. In tests it has been found that the ratio of reads/writes ca be as high as 10/90 percent.
Why is this important? Well, most storage vendors don’t even mention IOPS and if they do they tend to quote the faster read IOPS figures, so you have to be very careful when sizing a storage system for VDI.
When implementing a VDI system it should therefore be optimised for write IOPS as this is the principle bottleneck in a virtualised system. Although the actual startup of the virtual desktops involves a lot of reads, which can be mitigated by pre-starting the required amount of desktops before working hours, the real trouble comes when users logon. The impact on IOPS when users logon can be considerable, as multiple users all call the same login scripts and access similar parts of their disk image at the same time. Write IOPS can typically be twice that of a normal production session, but are multiplied up by the sheer number of users logging on at the same time. Read IOPS too can spike incredibly and if you have an environment where everybody logs on at exactly 9AM then you have to account for this, if the logon process is not to become unacceptably slow.
On top of this getting past the first login is not the end of it, as then you have the first start up of all the various applications in use. There is therefore another spike in IOPS around this time, which tend to average 50/50 in terms of reads and writes. However once this settles down the reads may drop by a factor of 5 but the writes tend to stay the same! This causes the read/write ratio to move to the 20/80 range and as write IOPS are the killer this is where your VDI implementation can really get hit
All the I/O from a virtualised client needs to come from shared storage, and as many clients read and write simultaneously the I/O is 100% random.
Therefore type of disk can have quite a bearing on the performance of your VDI solution. A 15k SAS disk can typically handle 180 IOPS, although this is a gross figure and the storage system involved can reduce this by as much as 30%. Lower end SATA disks only handle in the order of 50 IOPS, so you can immediately see how much of a bearing this can have on performance. However there is also another potential hazard in a large storage system; the RAID level used to get the single disks working together to provide greater storage and throughput.
RAID 5 works by writing data across a set of disks, then calculates the parity for that data and writes the parity onto to one of the disks in the set. This parity block is written to a different disk in the set for every further block of data, which is great for redundancy as you can lose an entire disk without losing any data, but is expensive in terms of performance.
To write to a RAID 5 set the affected blocks are first read, the changed data is then inputted and a new parity is calculated, before the blocks are then written back to disk. With large RAID 5 sets this means write I/O is many times slower than read I/O – and even with 15K rpm disks the potential write IOPS per disk are only in the order of 35-45 when read is nearer 160.
RAID 1, or mirroring, requires two blocks to be written for every write request – one to each disk in the mirror. The data does not need to be read first as there is no parity, so it is quicker than RAID 5 for writes, but is still expensive in that potential IOPS for a 15K disk are dropped to 70-80 per disk.
RAID 0, or striping, is the fastest as blocks of data are written in sequence to all disks. However because there is no copy or parity if a single disk fails in the set the entire set is lost. This should therefore only be used for volatile information, but at 140-150 IOPS does provide the best write speed.
For the best combination of speed and protection therefore a mixture of striping and mirroring, commonly called RAID 10 or RAID 1+0, is often utilised. This however is incredibly expensive in the number of disks, and you still get a write overhead of 50% compared to read across the RAID set. This can sometimes be mitigated by using an external mirroring appliance to do the mirroring operation, as opposed to the disk system itself. This means RAID 0 can be used at the disk level with one chassis mirrored to another by the appliance. This allows for maximum write speed to the physical disk while the mirroring device handles the write acknowledgements back to the calling application so it is not waiting for the second write. This can require a large cache in a busy system.
Because it is imperative to minimise IOPS from the storage, every I/O has to be as efficient as possible. Disk alignment is an important factor in this and surprisingly often overlooked.
Not every byte is read separately from the storage, but is split into blocks of 32k, 64k or 128k depending on vendors – hence the term “block storage”. If the filesystem on top of these blocks is not perfectly aligned with the blocks and I/O from the filesystem can result in two I/Os from the storage. If that filesystem is on a virtual disk and the virtual disk sits on a misaligned filesystem then it can result in THREE I/Os per client I/O! This is why it can be incredibly important to address this as combined with RAID overhead it is possible for write IOPS to be less than 10% of the read IOPS in such a system – and as VDI is mainly write IOPS it becomes a BIG problem!
Windows 7 does try and align partitions at the 1MB boundary, but Windows XP does not – so immediately creating the situation described above. VMware ESX also misaligns partitions of the VMFS is created through the service console, however creating through vCenter server corrects this, so be aware of how the base system is configured.
The gain from correctly aligning disks can be 30-50% for random I/O, which means VDI can be substantially affected by this.
Prefetch and defrag
NTFS on Windows works best if I/O is contiguous, and this is often achieved by defragging. However in a VDI system the main C: drive of the client is refreshed from the master disk on every reboot, so a drefrag should only be carried out right at the end of the build process for the gold master disk and disabled on the client itself or it will kill I/O for no benefit.
Similarly prefetching on Windows should also be disabled. This is because it tries to serialise the I/Os to make them more efficient, but on a system with hundreds or thousands of clients doing the same thing the result is random I/O anyway. As well as this if deduplication is used on the disks the prefetch moving files around will actually hinder rather than benefit the system.
How many disks?
The amount of IOPS a client users is very dependent on the users and their applications, but on average is around 8-10 per client with a read/write ratio between 40/60 and 20/80. This is assuming the base image is optimised to do as little as possible (which I will cover in a later blog), and all the I/Os come from the applications and not the OS.
Assuming around 65 clients per host (a host being a twin CPU quad core server with 48GB of RAM as a minimum), this would require 650 IOPS, where 80% (520) are writes and 20% (130) are reads. With RAID 5 that means you would need (520/45) + (130/160) or ~ 13 disks for every 65 clients. Therefore for 1000 VDI clients you would be looking at 200 disks. This would be roughly halved with RAID 10 due to the greater write I/O available. This is counter intuitive as it is generally thought you need more disks for RAID 10 over RAID 5, but it just goes to show what a different paradigm VDI is as we are not dealing with data quantity, but raw performance. Even so this is only 10 clients per disk just for the basic disk storage system, so is hardly economic.
SSDs as an alternative for VDI
Solid State Disks (SSDs) are actually more like large memory sticks than disks. The advantage is that they can handle a huge amount of IOPS (tens of thousands) and have no moving parts, so latencies on data access are measured in microseconds rather than milliseconds. Also the power consumption of SSD disks is only a fraction of their spinning magnetic cousins.
SSDs do have much higher read IOPS potentials than write, for example an Intel X25-E SSD drive can handle 30,000 read IOPS but ‘only’ 3,300 write IOPS. However given that SSDs are now less than 4 times the price of a regular 15K disk but can produce 20 times the write IOPS they are well suited to a VDI environment on cost grounds as well as performance. There are disadvantages with lifecycles, which will improve with time, but even so if disks were to be replaced every year – and getting cheaper, faster, and more durable on each replacement cycle, they still far undercut regular magnetic disks in terms of cost.
To approach the example above a 1000 desktop VDI system placed on SSDs in a RAID 10 configuration could be achieved with just 6 disks – less than half the amount of 15K disks required to support one 65 client host. So even if the SSDs were to be replaced like for like each year (unlikely with a lifetime write of 2 Petabytes per disk) it would take 5 years to spend the same on disks compared to a magnetic disk solution – excluding all the extra chassis and power expenditure during that period!
Bottom line though you would be supporting over 165 clients per disk – THAT is economic! In-line deduplication would need to be used to address the low amount of storage an SSD can provide, but this is possible today as VDI is basically a single cloned image and all 1000 desktops would be identical, with only the individual user data being run through deduplication to the remaining SSD disk space. All other application data would be held on standard magnetic storage where high IOPS are not such an issue – as with the standard server virtualised environment.*
Further advances – serialising random writes
The biggest overhead in a VDI system is random write I/O. Therefore if it were possible to serialise this random I/O then the amount of IOPS the system could handle would increase dramatically. This requires the storage system to cache the random I/O and then write to disk sequentially, and when combined with SSDs this can push IOPS into the 100,000s and support thousands of desktops on a single 2U storage device, consuming a couple of hundred watts of power. There are vendors such as Whiptail that offer such systems today, and this is likely to be a feature that will appear more regularly as SSD uptake increases.
Calculating the amount of storage needed in order to properly host VDI is a practise not to be taken lightly. The main bottleneck is IOPS, particularly write IOPS, and because write I/O is more costly than read on any storage system the number of disks required increases accordingly. There are ways in which this can be mitigated, but there is no substitute for planning your desktop virtualisation project and having a good idea just what you will need to make the difference between a successful and failed project.
We will follow shortly with another blog in the desktop virtualisation series, looking at how you can cut down a Windows 7 client so it uses the minimum amount of resources, and maximise the resources of your host platform.
11/01/2011 – Now available! (http://blog.millennia.it/making-a-leaner-fitter-windows-7-virtual-desk)
* Application virtualisation will also add to IOPS as applications are delivered from centralised application servers to hundreds or thousands of users. Therefore it is advised that the server handling application delivery is also running from storage with a high IOPS potential; although these will generally be read IOPS and therefore more easy to deliver from the main storage system.