A flash cache that has high performance and the same endurances as the rest of the storage system? I recently returned from the Middleware 2015 conference in Vancouver, B.C. where I presented a paper on creating just such a cache. There has been a massive trend towards adding flash caches to traditional disk systems to lower latency. The idea is as old as CPU caches (L1, L2, etc.) and virtual memory — keep your frequently accessed data in a fast cache, and your system will run faster. Caching has been widely studied, and there are numerous advanced algorithms to improve cache hit ratios.
The key trade-off everyone makes with flash caches, though, is between performance and lifespan. If you constantly insert data into a cache and evict data to make space, you will use up the limited erase cycles of the flash device, and it stops working. If you don’t insert new data into the cache, it will not help your performance as much. If a storage system could maximize performance and endurance of its flash cache, it could optimize its cost per performance.
A commonly studied approach to address the limited erasures on flash is to group together megabytes of user data into containers before writing them to the device. Containers will be written and evicted as a whole, since this avoids the device juggling where to place small writes, which causes internal erasures. A new challenge, though, is that most caching research has focused on managing individual data blocks (e.g. 4KB) that are more applicable to the user data or application, but containers tend to be many megabytes in size (2MB-256MB ).
Caching algorithms for containers thus far have been fairly simple and have made decisions at the coarse granularity of containers instead of blocks. This approach works when blocks in containers are read a fairly similar number of times. In one example, though, we found a container with a block read over 800 times (hot), while most of the other blocks in the container were never read or read only once (cold). We also found that as clients delete files and overwrite portions of their data, blocks in the cache become invalidated, which results in wasted space in a valuable cache. All combinations of hot, cold, and invalid blocks may exist in a container. We refer to these containers as divergent, since they contain blocks of varying value to the client. A difficulty with divergent containers is that a frequently read block (hot) may keep a container in the cache, even if the rest of the container has useless data (cold or invalidated).
The question we address is: How can we get the advantages of advanced per-block caching techniques combined with the lifespan advantages of using containers while handling divergent data access patterns? Phrased another way, how do we manage containers? Which container should we evict? And once selected, should we evict the container in its entirety, or should we copy valuable blocks to a newly created container?
While there are many technical details about our algorithm in the paper, I want to give you a high level sense of our work. We combined two techniques to identify containers that are either completely cold or divergent. The first technique is a variant on standard least-recently-used caching but with the twist of managing containers instead of individual blocks. We used two least-recently-used lists: a hot list and a cold list. A container starts in the cold list, and if it is rarely accessed, it will be evicted. If the container is accessed enough times, it moves to the hot list, so it can stay in the cache longer than if we only had a single list of containers.
The second technique is that we give every container a survival time for its access pattern to hopefully become stable. We increase the survival time when there are reads to previously unread blocks, and we shorten the survival time as blocks are invalidated. When it is time to evict a container from the cache, we select a container that has reached the end of its survival time. If none exist, we select the least-recently-used container from the cold list.
Finally, to control erasures, we intelligently manage eviction so that data blocks accessed a greater number of times are preserved while lower-value blocks are removed. As erasures increase within a time period (i.e. an hour), the decision about a block’s value becomes stricter, and only valuable blocks are copied into a new container and written to flash.
We tested our technique by running numerous storage traces both through a flash simulator to measure internal erasures but also in a prototype using a real flash cache and an array of hard drives. We were able to achieve cache hit ratios (and performance speedups) close to the best techniques for individual block caching while dramatically reducing erasures.
With this technique, one can add a flash cache to storage systems with better assurance of low latencies and high performance over the lifespan of the system. In other words, you can have better performance at a lower cost.
 “Pannier: A Container-based Flash Cache for Compound Objects” by Cheng Li (Ph.D student intern), Philip Shilane, Fred Douglis, and Grant Wallace
-Philip Shilane @philipshilane