In 2016, Flash Changes Everything!*

*If by ‘everything’, you mean the media that sits inside of enterprise storage systems.

At an event in Paris, a customer asked, “Do you know what I like best about all-flash storage?” Since I had been warned that the French are sensitive, I resisted saying– “It doesn’t go on strike?” (At the time there was both a petrol and air traffic controller strike – in other words, a normal week in Paris.) His answer was disarmingly honest, “Everything else – cloud, hyperconverged infrastructure, containers – confuses me. But all-flash storage? It’s different, but I can understand it.”

While flash doesn’t disrupt the storage systems market, it is driving the evolution of storage systems. The evolution spans system design, vendor business models, and customer behavior. This time, let’s talk about basic storage system design.

The Evolution – System Design

Flash doesn’t change what storage systems do, but it does change how they do it. Flash storage systems enable applications and users to write and read data via a variety of protocols and networks – file, block, object, FICON, etc. They attempt to ensure that what a user stores is what the user reads. If that sounds like the functionality of disk and hybrid arrays, it is. Underneath, however, storage systems have changed how they do space optimization and how they make the media reliable.

Space Optimization

Disk storage systems make trade-offs between performance and space optimization. Space efficiency features like compression, deduplication, and clones incur costs: increased response time, management complexity, or unpredictable system performance. For decades, storage systems have optimized performance by laying out data in optimal locations on the disk. Space optimizations disrupt those carefully tuned algorithms. They fragment data, which increases the number of disk seeks, which degrades performance. As a result, disk systems implement space efficiency features for specific workloads (e.g. backup, archive, VDI, etc.) or as best-effort background tasks, but not as inline operations for general purpose usage.

All-flash storage systems both require and enable ubiquitous space efficiency. Flash delivers much greater I/O density than disk, but to make it cost effective, systems need to increase flash’s capacity density. While not all space efficiency techniques apply to all workloads, every flash array must make space efficiency features part of its toolkit. Conversely, flash storage makes it possible to deliver inline, ubiquitous space optimization. While the data may fragment, the random I/O performance of flash doesn’t depend on disk seeks; therefore, you can have space optimization and performance!

Note: Flash drives are growing much larger. The speed of reading data from the drive will not keep pace with the amount of data it stores. As a result, we’ll have a potential data access bottleneck. Flash storage systems will need to optimize data layout on a drive, intelligently spread data across drives, and cache efficiently. Storage media – the more things change, the more they stay the same.

Making Media Reliable

Storage systems work hard to return the same data that was written. All hardware fails. Storage media fails in multiple ways. The device completely fails. The device incorrectly writes data. The device returns wrong data. Regardless of the type of hardware failure, storage systems work to ensure that the users never know. While the mission remains the same, flash has different failure behaviors than disk drives.

Computer scientists have built companies, careers, and research groups on disk drive resiliency. Decades later, customers still debate over their preferred RAID algorithms. As we move into larger drives, we’ve resurrected the mirror vs. RAID vs. ECC debates. Meanwhile, the industry has increased the focus on predicting and handling drive failures, to reduce the impact of failed drives. Additionally, some research shows that media errors (on a healthy drive) and firmware bugs pose a more insidious threat to your data than full drive failures. Such events are both more common and less visible than failed drives. Thus, approaches like Data Domain’s Invulnerability Architecture have become a key market differentiator. Even in the year of “all-flash”, disk storage systems are evolving in the wake of their changing media.

Flash fails, but it fails differently than disk. The most obvious contrast is in “wear”. The mechanical components of disk drives wear out. That breakdown, however, is largely independent of the amount of times the system writes to the disk. Conversely, flash media is built of cells that can only be written a certain number of times before they wear out and cannot store data anymore. As a result, storage systems have changed their write behaviors to minimize and distribute the wear on the media. These modifications include: log-structured file systems to evenly distribute writes across the cells, space efficiency to reduce how many cells need to be written, and caching to eliminate frequent overwrites of data.

Meanwhile, all-flash arrays must respond to unique failure patterns of flash drives. First, we’re still learning how SSDs will fail. For example, how well will flash drives age? Unlike disk, where we have decades of experience in tracking drive failures over time, we’re still learning with flash. (I know vendors are trying to simulate accelerated aging, but I’m skeptical. The only proven way to accelerate aging is to have children.) Fortunately, we have more analytic tools available than ever before. Meanwhile, all-flash arrays are evolving traditional RAID approaches to better fit the new media. With a preference toward larger strip sizes (to minimize space consumption), resiliency across all components (e.g. across power zones in a disk array enclosure), and multi-drive resiliency (N+2), flash has forced an evolution of media failure analytics and protection.

Hardware fails – whether it is disk drives, flash drives, or memory. Storage systems will evolve to combat those failures. Regardless of the media and the failure characteristics, storage systems will continue deliver value by transforming inherently unreliable hardware into reliable data storage systems.

Conclusion

The disruption of storage media is driving the evolution of the storage system market. The basic needs haven’t changed. Customers want reliable storage that delivers the performance they need at the best possible cost. Flash storage changes many underlying assumptions, and storage systems are responding to the new media base. As a result, we’re all headed in the same direction.

The first question customers ask is – can a new system more quickly and efficiently add all of the expected resiliency and functionality to their “all-flash” base… or can established systems more quickly and efficiently modify their battle-tested resiliency and functionality to leverage the “all-flash” media? The second question is whether any of these systems can deliver more value than they’ve come to expect from traditional storage systems.

Before sharing my answer to those questions (giving time for each camp to bribe me – I do take t-shirts as payment), I will first discuss how business models and customer behaviors are changing in the next post.

Stephen Manley @makitadremel

One Reply to “In 2016, Flash Changes Everything!*”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s