Can Flash Become “General” Storage At Last?

Image001

Flash storage has been around for a while now, but to date it has remained too expensive for most organisations – even though the performance benefits offered are so compelling. However it now seems that with MLC prices continuing to fall, and MLC developing in lifetime and reliability, the time of flash entering the mainstream SMB market may now be upon us.

Problems faced by traditional HDD

Hard disk drives (HDD) are electro-mechanical devices subject to the laws of physics. They are spinning disks of magnetic media that have hit the wall at an rpm of 15,000 – a speed which has been common in high end SAS drives now for years and is unlikely to be increased. There is really no way forward to go in terms of spindle speed, so in order to extract the very last bit of performance from them data has become more tightly packed as capacity has increased and this has lead to better throughput.

However this only works for sequential reads and writes, and any randomness of I/O soon chops the performance off at the knees for a spinning disk, and what is the biggest source of random I/O today? It’s virtualisation, the direction everybody is now moving en masse.

On top of this HDD require a lot of power to maintain those spin speeds, and although some “green” models can spin down when not busy it is likely under RAID and virtualisation that they never really get any quiet time. Which of course brings us to yet more overhead poor old HDD has to cope with, RAID.

RAID accelerates throughput and I/O by binding multiple disks to act as one, but only when things are normal. The increasing capacities of those disks highlights an increasing problem with RAID sets in that when a disk fails the RAID has to rebuild a spare disk from the parity information, and on a RAID 5 group containing 2TB drives rebuild times can sometimes be talked of in terms of DAYS, let alone hours! Heaven help you if another drive fails in that time…

This has lead to an increase in the use of double-parity RAID 6 to help mitigate the dreaded error during rebuild scenario, but this in itself increases rebuild times due to the extra parity information – as well as slowing writes further even during optimal operation.

Virtualisation killed the magnetic disk

Ok, bit extreme, the death of tape has been called for years and it still exists. But the rapid uptake of server virtualisation created the initial problems with an increase in random I/O, and now desktop virtualisation – with its far higher IOPS requirement –  is really showing the age of spinning disks. Many VDI implementations, some for very large companies and costing millions, have failed to scale as expected and nearly all of them were due to poor storage subsystem performance.

No matter how good the storage subsystem it all comes down to the magnetic disk properties eventually, and you can’t get past physical laws with fancy programming – no matter what a vendor may tell you.

One of the big issues with virtualisation is LUN over subscribing. This is where differing workloads all have calls on the storage subsystem at the same time, and if a mission-critical VM wants to write to a LUN while a lazy lower priority VM is reading from it then the mission-critical VM may have to wait because the storage often has no idea about priorities. VMware has features such as disk shares and I/O control, but this is still often not granular enough to be completely effective.

If you happen to be using SATA drives with shortened buffers and queues because the SAS alternatives were just too expensive to get through the project board… well, how expensive is a storage timeout on your database?

Workarounds to keep HDD going

The usual answer to “We have a problem” is “Well fix it!”. So we have to enact a number of work-arounds to keep our systems going.

The most common way to do this is just throw more and more disks at the problem. This way more performance is derived from bigger RAID arrays, however the law of diminishing returns states that each extra HDD adds less each time it is added until you can be throwing bucketfuls of expensive hardware at a 10% I/O increase. Who has the budget for that these days?

Some systems use short-stroking, which is basically using just the outer, faster portion of each disk to increase performance. However this is incredibly wasteful and increases the problem of HDD sprawl even more.

These are just sticking plasters, we need a new way…

Along comes Flash – but oh the expense!

Flash Solid State Disks (SSD) have the performance. With no moving parts they are just as efficient at random I/O as they are at sequential, and with lower seek times and latency they can deliver far higher performance than any spinning disk could hope to match. Single-level cell (SLC) drives are the cream of the crop, with very high lifetimes, performance, and a price tag to match – between 10 and 20 times a 15k HDD on a per TeraByte basis.

Because of this they are often only used as cache to provide a low latency staging for the disks behind. The problem with cache is that if the data isn’t in it when required it has to come from HDD, and in the case of writes the amount of data written can fill the cache and then performance plummets while the cache is flushed to disk.

So cache has a place but again is really just a sticking plaster over the old problem.

Another method is to tier the data by required performance and install SSD as Tier 0 level with cascading levels of performance further down the tiers in the form of SAS and SATA HDD in varying spindle speeds down to 5.4k. This provides a hybrid solution balancing performance with capacity, but then requires a lot more management. You also have to get your data sizing just right or you find all that expense going out the window as the system bottlenecks again.

Of course all this management means specific systems designed to automate this process, and often you will find all the storage has to be from one vendor and comes at a premium price.

MLC Flash comes of age

A better solution comes in the form of multi-level cell (MLC) drives, which are cheaper to produce but are generally of lower performance than SLC and have a lower lifetime due to the amount of times a cell can be re-written. This is improving but the way this media is used is the key to launching it into the mainstream marketplace as a storage media of choice.

There are many vendors that have developed this media in such a way that it has lifetimes that match conventional disk. The most common is by deduplicating inline – to reduce the actual amount of data stored, and then serialising the data through the storage interface so it can be intelligently written in such a way as to minimise media wear and tear though “wear levelling algorithms”. WhipTail has had such a device out for a while, but where one leads others are bound to follow.

One such vendor that has exploded onto the market recently is Nimbus, with their new line of S-class 100% flash based storage. The Nimbus comes as fully modular shelves of drives with 100GB, 200GB and 400Gb capacity, giving shelf capacities of around 2.5TB, 5TB, and 10TB respectively. Usable capacities due to system configuration may be quite a bit less, but this is also offset by inline deduplication of up to 10x depending on the base data.

The maximum capacity shelf can produce 800,000 RANDOM IOPS and a full rack up to 1.65 million random IOPS. The maximum generally constrained by the ability to transmit that many transactions in a single second – although I’m sure that will yet increase.

Aha, but what about the cost, I’ll bet that is budget busting?

Their entry level 2.5TB usable shelf starts at just $24,995 – about the same price and capacity as a Dell EqualLogic PS6000XV in RAID-10 format, with over 50x the performance.

Still interested?

How the heck?….

Yep, I wondered too but it is all about the way the system is built and used.

Nimbus have designed their own Extended multi-level Cell (EMLC) “flash blades”. These have 10 times the erase cycle lifetime of MLC drives at 30,000 cycles, or about 1/3 of that of SLC drives.

For those worried that this might still not be enough for Enterprise use the way Nimbus use these drives means that to burn them out in 5 years would require you to write over 7TB an hour to them, continuously! Of course as drives get cheaper even the threat of 5 years lifetime becomes a moot point, how many of you have swapped a regular HDD within 2 or 3 years?

The on board HALO OS controls the wear-levelling algorithms and garbage collection to provide consistent system performance, as well as delivering all the Enterprise functionality such as: inline deduplication, storage virtualisation, thin provisioning, snapshots, mirroring (sync/async), and multi-protocol access from iSCSI to Infiniband (unfortunately not AoE – I may work on them about that one 😉 ).

Oh, and remember the agonising rebuild times with HDD in the event you do have a drive failure? With Nimbus you are back into minutes even with RAID 6 protection as standard.

Power to the people

One area often overlooked when doing a lifetime ROI study on a storage system is running costs. SSD consume far less power because of the lack of moving parts and the removal of the need for HDD sprawl to achieve performance requirements. As such, power consumption can be as much as 90% lower than traditional 15K disks at around 5W per TB. This will become incredibly important as energy costs only seem to go one way these days, especially in Rip Off Britain.

The run to server virtualisation has made great gains in removing power hungry servers that weren’t delivering anything but still consuming a lot of power. Some of this saving was cut by power gains in storage systems needed to service this virtualised world, but now here too we have a chance to make radical cuts.

A combined 200 VM virtualised system running on flash storage can now extend the power savings beyond 85% in both power delivery and resulting cooling requirements, compared to the same 200 physical servers with local storage.

Data centre emissions reduction target – tick.

When considered alongside reducing purchase costs an all flash system can actually be significantly cheaper than its HDD competitor over a 5 year life cycle, already, and this can only get better.

In conclusion

 

2000

2005

2010

Today

CPU

1 x

5 x

15 x

30 x

 

Pentium 4 1.5 GHz

Pentium D 2.6 GHz

Nehalem Quad 2.6 GHz

Octal 2.6 GHz

DRAM

1 x

4 x

8 x

16 x

 

DDR1 PC-2100

DDR2 PC2-4200

DDR3 PC3-8500

DDR3 PC3-17000

Network

1 x

10 x

100 x

400 x

 

100Mb Ethernet

Gigabit Ethernet

10 Gigabit Ethernet

40 Gigabit Ethernet

Bus

1 x

15 x

30 x

60 x

 

PCI 32-bit/33 MHz

PCIe Gen1 x8

PCIe Gen2 x8

PCIe Gen3 x8

Disk

1 x

1 x

1 x

50 x – 100 x

IOPS

15K HDD SAS

15K HDD SAS

15K HDD SAS

SSD Flash Drive

Original source: Nimbus (www.nimbusdata.com)

Finally it seems that flash is ready to enter the arena as mainstream storage, perfectly placed for the next major upswing in virtualisation caused by Cloud uptake and VDI gaining more traction. Once a significant amount of organisations have seen what is possible, and at such a reasonable cost, the development of this technology will only bring prices lower.

Traditional HDD still has a long time to run, in the areas of high capacity and serial I/O such as backup and archiving it is well placed to provide the complement to flash – which has a long way to go before Petabytes of archive storage look sensible on this platform. However the cost of flash has now been removed as a barrier to entry in the mainstream market for general production use.

Now, at last, storage can catch up with the Moore’s law developments that CPU and RAM have undergone in the last decade or so.

Flash has arrived – make it so.

For more info contact us at info@millennia.it, and we can discuss your storage requirements and design a SSD/HDD balanced system that meets your needs, and your pocket.

This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s