Moving Beyond Flash: Non-volatile Memory
I previously discussed my work with current flash products as well as technology trends driving the future of flash. Next, I will look into a fuzzy crystal ball at new storage media technologies coming to market.
The traditional storage hierarchy had DRAM at the top for fast accesses and hard disk drives at the bottom for persistent storage. Then we added flash to the hierarchy, either as a cache between the two layers or as a replacement for hard disk drives. But the storage vendors do not stand still; they are innovating on new storage media. In this blog, I want to focus on two non-volatile memory technologies: Non-volatile Memory Express (NVMe) and non-volatile DIMMs (NVDIMMs). For a primer on these topics, check out the proceedings of the 2016 Flash Memory Summit.
When performance engineers started measuring the time each step takes when accessing flash, they found a surprising result. While flash is very fast, a sizable fraction of the access time is devoted to communication overheads just to reach flash. NVMe was developed to address this problem. Now, instead of taking 80 microseconds to read traditional flash, using NVMe, it may take 7 microseconds. Any time you can improve access latency by an order of magnitude, it gets attention.
NVMe is really a protocol, not a media itself, and Dell EMC is part of the specification committee considering better parallelization, deeper queues, and fine-grained user control. NVMe could be built with traditional flash or a newer technology. While flash-based NVMe products are currently coming to market, a more speculative technology may increase the lifespan of NVMe devices and up-end the storage hierarchy.
Upgrading the D in DRAM
There is research into new storage media with improved characteristics relative to flash, such as nanosecond access times (vs. microseconds), millions of erasures (vs. thousands), and access with memory pointers (vs. block access with read/writes). These technologies are a new alphabet soup of acronyms to learn: Phase Change Memory (PCM), Spin Transfer Torque (STT), Resistive RAM (RRAM), Carbon Nanotube RAM (NRAM), and others.
I won’t go through all of the details of these techniques, but it is enough to know that these are non-volatile storage media that fall somewhere between DRAM and flash for most characteristics. There has been work on these media for decades, but perhaps we are reaching the point where a mature product is ready for customers.
Intel and Micron leaped into the middle of the NVM space when they announced the 3D XPoint product for both NVMe and NVDIMMS. Unlike previous university and corporate research projects, the companies described roadmaps for the next few years. There have been enough internal meetings and vendor demonstrations, that 3D XPoint looks real.
The implementation details are under wraps, but the best guess is that it is a variant of PCM. Accessed via NVMe, 3D XPoint largely solves the lifespan issues with flash by increasing the number of erasure cycles by several orders of magnitude. Used within NVDIMMS, it has the potential to extend server memory from gigabytes to terabytes. Imagine running multi-terabyte databases in memory. Imagine a storage system where flash is the slow media for colder content. Imagine compute where all data is available with a pointer instead of a read/write call that waits millions of CPU cycles to complete. Turning these imaginations into reality is the hard part.
I need to provide a quick disclaimer: the terminology in this community can be misleading. In my opinion, any product labeled “non-volatile memory (NVM)” should 1. be non-volatile (i.e. hold its value across power events), and 2. be memory (i.e. accessed by a software pointer, not a read/write call at a block size). Sadly, there are products with the NVM label that are both volatile and read/written in large blocks. I am (not) sorry to be pedantic, but that makes me angry and confuses customers.
Not the End of Storage
Do NVMe and NVDIMM products eliminate the need for clever storage design? Thankfully no, or I would be out of a job. While a new media dramatically decreases the latency of data access, there are whole new storage architectures to figure out and programming models to design. The programming complexities of NVDIMMs are an area of active research because data persistence is not immediate. A flush command is needed to move data from processor caches to NVDIMMs, which introduces its own delay.
Also, we are used to the idea of rebooting a computer if it is acting odd. That has traditionally cleared the memory and reloaded a consistent state from storage. Now, the inconsistent state in memory will still be there after a reboot. We’ll need to figure out how to make systems behave with these new characteristics.
Besides these persistence issues, we will need to explore the right caching/tiering model in combination with existing media. While more data can stay in persistent memory, not all of it will. Furthermore, we’ll still need to protect and recover the data, regardless of where the primary copy resides.
I don’t know the right answers to these problems, but we will find out. Considering how flash has upended the market, I can’t wait to see the impact of new non-volatile memory technologies under development. I know that, regardless of how we answer the questions, the storage world will look different in a few years.
~Philip Shilane @philipshilane