While my early work in flash caching focused on the flash devices we currently have, the next stage of my research looks at where flash is going.
A combination of customer needs and vendor competition is driving flash towards higher capacity and lower cost, through a variety of technical advancements. When I recently purchased a 128GB SD card for a digital camera, I was reminded of the now-useless 64MB cards I had purchased a decade ago.
A similar trend exists at the enterprise and cloud storage level. While a 100GB flash device was standard years ago, a prototype 60TB flash device was announced at the Flash Memory Summit in August. No price was announced, but I would guess it will cost over $10,000, which sounds high, but at potentially $0.20 per gigabyte, it will be among the cheapest flash devices, per gigabyte. I don’t need this product in my home, but it will find a role in enterprise or cloud storage. How did flash technology evolve to the point where a 60TB flash device is a plausible product?
There are three main ongoing technical trends driving flash technology. All of the trends focus on increasing the density of flash storage. First, manufacturing processes continue to squeeze more bits into the same chip, which is analogous to denser hard drive capacities and more CPU cores.
Until recently, advancement in density has been within the two dimensional plane of flash chips. Now however, researchers are pursuing a second trend – to stack flash chips within a device, which is conveniently called 3D stacking. Vendors are stacking 48 or more layers already, and there is research pushing to 100 or more layers in the coming years.
The third trend is to increase the information content of individual cells. At a simplistic level, a flash cell is a storage unit for electrons. Based on the charge of the cell, the value of the cell is either a 0 or a 1. This was the original design for flash cells called a single-level cell (SLC). Instead of simply splitting the range of cell charges into two sub-ranges (representing 0 and 1), it is possible to further divide the ranges again to produce four values (0, 1, 2 and 3), called a multi-level cell (MLC). This effectively doubled the capacity of a flash device. This process of dividing charge ranges continued to three-level cells (TLC) with bit values of 0 through 7, and we are even beginning to see quad-level cells (QLC) with bit values 0 through 15. QLC flash effectively has 8X the capacity of SLC flash, and flash vendors sell these denser products at cheaper prices per gigabyte.
Impacts of Flash Technology Trends
Unfortunately, there is a downside to packing flash cells more tightly and finely slicing the charge ranges. Recording a value in a cell is a slightly destructive process that wears on the components of the cell. This leads to slight fluctuations in a cell’s charge and means the chance of incorrectly identifying a cell’s true value increases.
Flash manufacturers have addressed the risk of storing/reading data incorrectly by adding error correcting technology, but the end result is that flash lifespans are shrinking. Once a cell has been damaged from numerous writes, it can no longer be reliably written again, so the flash device marks it as invalid.
How should you assess the reliability of flash devices? First, you want to think in terms of writes, since that’s what wears out the flash device. You should be able to read unmodified data for as long as you want. Flash devices are designed to spread writes evenly across the cells, so we typically think about the write cycles of the entire device instead of individual cells. Therefore, think of a write cycle as a period where the entire flash capacity is written. Whereas SLC flash could support 100,000 write cycles, MLC flash supports a few thousand, TLC flash around 1,000 and QLC flash perhaps a few hundred. This leads to the industry using the term writes-per-day to describe flash devices. Writes-per-day is equal to the number of supported write cycles divided by the number of days in the expected lifetime of the device (e.g. 365 days times 5 years). More writes-per-day means that the flash device is less likely to have cells wear out and fail.
Performance characteristics are also changing with denser devices. Reading, writing and erasing are all getting slower. The fastest SLC flash devices could read in just 25 microseconds, write in 300 microseconds and erase in 1.5 milliseconds. Each time the charge range is divided to cram more logical values into a cell, these operations have gotten slower. In some cases, doubling the bit density has doubled the latency of each operation. Reads have slowed for a number of reasons including writes and background erasures interfering with reads.
A related issue is the throughput to read and write, which is the number of megabytes (or gigabytes) that can be read and written per second. While capacities are doubling, throughput has increased slowly with each generation of products. Flash devices are getting bigger but we can’t access the full space as quickly as before.
We tend to think of flash as blazingly fast, and that is true relative to the latency of reading and writing to hard drives, but flash is slowing down each year. It is an interesting trend of cheaper, denser, slower flash devices coming to market with shorter lifespans.
Storage System Design Changes
The evolution of flash has led to unexpected choices in system design. Sometimes we select smaller, more expensive (per gigabyte) flash devices because they have the lifespan needed for a storage system. In other cases we may select multiple, smaller flash devices over a single, large flash device to get the overall IOPS and throughput needed by a system. Simply picking the newest, biggest flash device isn’t always the right answer.
The evolution of flash also affects how to design smaller storage systems. As flash vendors have emphasized manufacturing larger and larger devices, we have run into situations where they no longer stock low capacity flash devices needed for storage systems targeted at small businesses. Just as some customers had to buy excess hard disk capacity in their storage system (for performance or reliability), the same may be coming for flash storage systems, too.
As a result, we are starting to consider tiers of flash, where a high performance cache uses SLC flash with the highest endurance and lowest latency while colder data is stored on denser, low endurance flash. Of course, that assumes that flash won’t be supplanted by newer media, itself.
While flash has been the hot media of the last decade and will continue to be heavily leveraged for years to come, there are new storage media coming to market and in the prototype phase. My next blog post will discuss NVMe and new non-volatile DRAM options and the potential to upend not only storage but servers as well.
~Philip Shilane @philipshilane