Managing Your Computing Ecosystem

Managing Your Computing Ecosystem

Overview

There is a very real opportunity to take a giant step towards universal and interoperable management interfaces that are defined in terms of what your clients want to achieve. In the process, the industry can evolve away from the current complex, proprietary and product specific interfaces.

You’ve heard this promise before, but it’s never come to pass. What’s different this time? Major players are converging storage and servers. Functionality is commoditizing. Customers are demanding it more than ever.

Three industry-led open standards efforts have come together to collectively provide an easy to use and comprehensive API for managing all of the elements in your computing ecosystem, ranging from simple laptops to geographically distributed data centers.

This API is specified by:

  • the Open Data Protocol (OData) from Oasis
  • the Redfish Scalable Platforms Management API from the DMTF
  • the Swordfish Scalable Storage Management API from the SNIA

One can build a management service that is conformant to the Redfish or Swordfish specifications that provides a comprehensive interface for the discovery of the managed physical infrastructure, as well as for the provisioning, monitoring, and management of the environmental, compute, networking, and storage resources provided by that infrastructure. That management service is an OData conformant data service.

These specifications are evolving and certainly are not complete in all aspects. Nevertheless, they are already sufficient to provide comprehensive management of most features of products in the computing ecosystem.

This post and the following two will provide a short overview of each.

OData odata

The first effort is the definition of the Open Data Protocol (OData). OData v4 specifications are OASIS standards that have also begun the international standardization process with ISO.

Simply asserting that a data service has a Restful API does nothing to assure that it is interoperable with any other data service. More importantly, Rest by itself makes no guarantees that a client of one Restful data service will be able to discover or know how to even navigate around the Restful API presented by some other data service.

OData enables interoperable utilization of Restful data services. Such services allow resources, identified using Uniform Resource Locators (URLs) and defined in an Entity Data Model (EDM), to be published and edited by Web clients using simple HTTP messages.  In addition to Redfish and Swordfish described below, a growing number of applications support OData data services, e.g. Microsoft Azure, SAP NetWeaver, IBM WebSphere, and Salesforce.

The OData Common Schema Definition Language (CSDL) specifies a standard metamodel used to define an Entity Data Model over which an OData service acts. The metamodel defined by CSDL is consistent with common elements of the UML v2.5 metamodel.   This fact enables reliable translation to your programming language of your choice.

OData standardizes the construction of Restful APIs. OData provides standards for navigation between resources, for request and response payloads and for operation syntax. It specifies the discovery of the entity data model for the accessed data service. It also specifies how resources defined by the entity data model can be discovered. While it does not standardize the APIs themselves, OData does standardize how payloads are constructed and a set of query options and many other items that are often different across the many current Restful data services. OData specifications utilize standard HTTP, AtomPub, and JSON. Also, standard URIs are used to address and access resources.

The use of the OData protocol enables a client to access information from a variety of sources including relational databases, servers, storage systems, file systems, content management systems, traditional Web sites, and more.

Ubiquitous use will break down information silos and will enable interoperability between producers and consumers. This will significantly increase the ability to provide new and richer functionality on top of the OData services.

The OData specifications define:

Conclusion

While Rest is a useful architectural style, it is not a “standard” and the variances in Restful APIs to express similar functions means that there is no standard way to interact with different systems. OData is laying the groundwork for interoperable management by standardizing the construction of Restful APIs. Next up – Redfish.

 

~George Ericson @GEricson

 

Data Protection for Public Cloud Environments

Data Protection for Public Cloud Environments

In late 2015 I was researching the options available to protect application workloads running in public cloud environments. In this post I will discuss my findings, and what we are doing at Dell EMC to bring Enterprise grade Data Protection Solutions to workloads running in public cloud environments.

 

To understand how Data Protection applies to public cloud environments, we need to recognize that Data Protection can occur at different layers in the infrastructure. These include the server, storage, hypervisor (if virtualized), application and platform layer. When we implement Data Protection for on premises environments, our ability to exercise Data Protection functions at any one of these layers depends upon the technologies in use.

 

At the server layer, we typically deploy an agent-based solution that manages the creation of Data Protection copies of the running environment. This method can be used for virtualized, bare metal and even containerized environments that persist data.

 

At the application layer we typically rely on the applications’ native data protection functions to generate copies (usually to file system or pseudo media targets). Examples of this can include database dumps to local or remote disk storage. We can go a step further and embed control-and-data path plugins into the application layer to enable the application’s native data protection methods to interface with Data Protection storage for efficient data transfer and Data Protection management software for policy, scheduling, reporting and audit purposes.

 

Like the server approach, the application native approach is agnostic to the platform the application is running on, be it virtualized, bare metal or containerized, in public or private cloud environments. Where things get interesting is when we start leveraging infrastructure layers to support Data Protection requirements. The most common infrastructure layers used are the hypervisor and storage-centric Data Protection methods.

 

A by-product of infrastructure methods is they require privileged access to the infrastructure’s interfaces to create protection copies. In private cloud environments this requires coordination and trust between the Backup or Protection Administrator and the Storage or Virtualization Administrator. This access is often negotiated when the service is first established. In Public Cloud environments there is no Storage or Virtualization Administrator we can talk with to negotiate access. These layers are off limits to consumers of the Public Cloud. If we want to exercise Data Protection at these layers, we have to rely on the services that the Public Cloud provider makes available. These services are often referred to as Cloud-based Data Protection.

 

For example, Amazon Web Services (AWS) offers snapshots of Elastic Block Storage (EBS) volumes to S3 storage. This provides protection of volumes at the block-level. Microsoft Azure offers snapshots of VM’s to Azure Blob Storage and the Azure backup service for VM instances running the Windows Operating Systems.

 

A common property of Cloud-based Data Protection services and infrastructure-centric protection methods for that matter, is they are tightly coupled. Tight coupling means the technologies and methods are highly dependent on one another to function, which allows the method to perform at peak efficiency. For example, the method is able to track the underlying data that is changing in the virtual machine instance, and when appropriate take copies of the data that has changed between copies.

 

Tightly coupled methods have gained popularity in recent years simply because data volumes continue to grow to the extent that traditional methods are struggling to keep up. However, there are some important trade-offs being made when we bet the business solely on tightly coupled Data Protection methods.

 

Tight coupling trades efficiency for flexibility. In other words, we can have a very efficient capability, but it is highly inflexible. In the case of Data Protection, a solution focused on flexibility allows one to free the data copies from the underlying infrastructure. For example, in the case of AWS snapshot copies to S3, the copies are forever tied to the public cloud platform. This is a critical point that requires careful attention when devising a Public Cloud Data Protection strategy.

 

The best way I can describe the implications is to compare the situation to traditional on premises Data Protection methods. With on premises solutions, you are in full control of the creation, storage and recovery processes. For example, let us assume you have implemented a protection solution using a vendor’s product. This product would normally implement and manage the process of creating copies and storing these copies on media in the vendor’s data format (which in modern times is native to the application being protected). The property we usually take for granted here is we can move these copies from one media format to another or one location to another. We can also recover them to different systems and platforms. This heterogeneity offers flexibility, which enables choice. The choice to change our mind or adjust our approach to managing copies subject to changing conditions. For example, with loosely coupled copies, we can migrate them from one public cloud providers’ object storage (e.g. AWS S3) to another public cloud providers’ object storage (Azure Blob Storage), or even back to private cloud object storage (Elastic Cloud Storage), if we decide to bring certain workloads on premises.

 

Despite these trade-offs, there are very good reasons to use a public cloud providers native Data Protection functions. For example, if we want fast full VM recovery back to the source, we would be hard pressed to find a faster solution. However, cloud-native solutions do not address all recovery scenarios and lack flexibility. To mitigate these risks, a dual approach is often pursued that address the efficiency, speed and flexibility required by Enterprise applications, in public, private or hybrid cloud models.

 

My general advice to customers is to leverage tightly coupled Data Protection methods for short-lived Data Protection requirements, along with loosely coupled methods. In the case of Public Cloud models, this requires the deployment of software technologies (or hardware, via services like Direct Connect and ExpressRoute) that are not tied to the Public Cloud provider’s platform or data formats. As a consumer of Public Cloud services, this will afford you the flexibility to free your data copies if need be, in future.

 

Our Strategy

 

At Dell EMC we recognize that customers will deploy workloads across a variety of cloud delivery models. These workloads will require multiple forms of Data Protection, based on the value of the data and the desire to maintain independence from the underlying infrastructure or platform hosting the workloads.

 

Our strategy is to provide customers Data Protection everywhere. This protection will be delivered via multiple avenues, including orchestrating the control path of Public Cloud provider’s native solutions, and allowing the Public Cloud to host and manage the data path and storage. For workloads that require ultimate flexibility and independent Data Protection copies, we will also manage the data path and storage, to enable copies to remain agnostic to the cloud vendor. Furthermore, for customers that choose to consume SaaS-based solutions, we will continue to work with SaaS vendors to expand our existing SaaS Data Protection offering to export and manage data copies using the vendor’s available API’s, to the extent possible.

 

Ultimately, customers will choose which path they take. Our strategy is to ensure our Data Protection solutions allow customers to take any path available to them.

 

~Peter Marelas @pmarelas

Mathematics, Big Data, and Joss Whedon

Mathematics, Big Data, and Joss Whedon

Definition 1: The symmetric difference of two sets A and B, denoted A \Delta B , is the set of elements in each of A and B, but not in their intersection.

Let A be “Mathematics”, and let B be “Data Science”. This is certainly not the first article vying for attention with the latter buzzword, so I’ll go ahead and insert a few more here to help boost traffic and readership:

Analytics, Machine Learning, Algorithm,

Neural Networks, Bayesian, Big Data

These formerly technical words (except that last one) used to live solidly in the dingy faculty lounge of set A. They have since been distorted into vague corporate buzzwords, shunning their well-defined mathematical roots for the sexier company of “synergy”, “leverage”, and “swim lanes” at refined business luncheons. All of the above words have allowed themselves to become elements of the nebulous set B: “Data Science”. As the entire corporate and academic world scrambles to rebrand themselves as members of Big Data™, allow me to pause the chaos in order to reclaim set A.   This isn’t to say that set B is without its merits. Data Science is Joss Whedon, making the uncool comic books so hip that Target sells T-shirts now. The advent of powerful computational resources and a worldwide saturation of data have sparked a mathematical revival of sorts. (It is actually possible for university mathematics departments to receive funding now.) Data Science has inspired the development of methods for quantifying every aspect of life and business, many of which were forged in mathematical crucibles. Data science has built bridges between research disciplines, and sparked some taste for a subject that was previously about as appetizing to most as dry Thanksgiving turkey without gravy. Data science has driven billions of dollars in sales across every industry, customized our lives to our particular tastes, and advanced medical technology, to name a few. Moreover, the techniques employed by data scientists have mathematical roots. Good data scientists have some mathematical background, and my buzzwords above are certainly in both sets. Clearly,  A \cup B   is nonempty, and the two sets are not disjoint. However, the symmetric difference between the two sets is large. Symbolically,  (A \Delta B) \gg   A \cup B   . To avoid repetition of the plethora of articles about Data Science, our focus will be on the elements of mathematics that data science lacks. In mathematical symbols, we investigate the set A \ B.

Mathematics is simplification. Mathematicians seek to strip a problem bare. Just as every building has a foundation and a frame, every “applied” problem has a premise and a structure. Abstracting the problem into a mathematical realm identifies the facade of the problem that previously seemed necessary. An architect can design an entire subdivision with one floor plan, and introduce variation in cosmetic features to produce a hundred seemingly different homes. Mathematicians reverse this process, ignoring the unnecessary variation in building materials to find the underlying structure of the houses. A mathematician can solve several business problems with one good model by studying the anatomy of the problems.

Mathematics is rigor. My real analysis professor in graduate school told us that a mathematician’s job is two-fold: to break things and to build unbreakable things. We work in proofs, not judgment. Many of the data science algorithms and statistical tests that get name dropped at parties today are actually quite rigorous, if the assumptions are met. It is disingenuous to scorn statistics as merely a tool to lie; one doesn’t blame the screwdriver that is being misused as a hammer. Mathematicians focus on these assumptions. A longer list of assumptions prior to a statement indicates a weak statement; our goal is to strip assumptions one by one to see when the statement (or algorithm) breaks. Once we break it, we recraft it into a stronger statement with fewer assumptions, giving it more power.

Mathematics is elegance. Ultimately, this statement is a linear combination of the previous two, but still provides an insightful contrast. Data science has become a tool crib of “black box” algorithms that one employs in his language of choice. Many of these models have become uninterpretable blobs that churn out an answer (even good ones by many measures of performance. Pick your favorite measure–p values, Euclidean distance, prediction error.) They solve the specific problem given wonderfully, molding themselves to the given data like a good pair of spandex leggings. However, they provide no structure, no insight beyond that particular type of data. Understanding the problem takes a back seat to predictions, because predictions make money, especially before the end of the quarter. Vision is long-term and expensive. This type of thinking is short-sighted; with some investment, that singular dataset may reveal a structure that is isomorphic to another problem in an unrelated department, and even one that may be exceedingly simple in nature. In this case, mathematics can provide an interpretable, elegant solution that solves multiple problems, provides insight to behavior, and still retains predictive power.

As an example, let us examine the saturated research field of disk failures. There is certainly no shortage of papers that develop complex algorithms for disk failure prediction; typically the best performing ones are an ensemble method of some kind. Certain errors are good predictors of disk failure, for instance, medium errors and reallocated sectors. These errors evolve randomly, but always increase. A Markov chain fits this behavior perfectly, and we have developed the method to model these errors. Parameter estimation is a challenge, but the idea is simple, elegant, and interpretable. Because the mathematics are so versatile, with just one transition matrix a user can answer almost any question he likes without needing to rerun the model. This approach allows for both predictive analytics and behavior monitoring, is quick to implement, and is analytically (in the mathematical sense) sound. The only estimation needed is in the parameters, not in the model structure itself. Effective parameter estimation will effectively guarantee good performance.

There is room for both data scientists and mathematicians; the relationship between a data scientist and a mathematician is a symbiotic one. Practicality forces a quick or canned solution at times, and sometimes the time investment needed to “reinvent the wheel” when we have (almost) infinite storage and processing power at hand is not good business. Both data science and mathematics require extensive study to be effective; one course on Coursera does not make one a data scientist, just as calculus knowledge does not make one a mathematician. But ultimately, mathematics is the foundation of all science; we shouldn’t forget to build that foundation in the quest to be industry Big Data™ leaders.

 

~Rachel Traylor @mathpocalypse

 

How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory Part III

How I Learned to Stop Worrying and Love New Storage Media:   The Promises and Pitfalls of Flash and New Non-volatile Memory Part III

Moving Beyond Flash: Non-volatile Memory

I previously discussed my work with current flash products as well as technology trends driving the future of flash. Next, I will look into a fuzzy crystal ball at new storage media technologies coming to market.

The traditional storage hierarchy had DRAM at the top for fast accesses and hard disk drives at the bottom for persistent storage. Then we added flash to the hierarchy, either as a cache between the two layers or as a replacement for hard disk drives. But the storage vendors do not stand still; they are innovating on new storage media. In this blog, I want to focus on two non-volatile memory technologies: Non-volatile Memory Express (NVMe) and non-volatile DIMMs (NVDIMMs). For a primer on these topics, check out the proceedings of the 2016 Flash Memory Summit.

NVMe

When performance engineers started measuring the time each step takes when accessing flash, they found a surprising result. While flash is very fast, a sizable fraction of the access time is devoted to communication overheads just to reach flash. NVMe was developed to address this problem. Now, instead of taking 80 microseconds to read traditional flash, using NVMe, it may take 7 microseconds. Any time you can improve access latency by an order of magnitude, it gets attention.

NVMe is really a protocol, not a media itself, and Dell EMC is part of the specification committee considering better parallelization, deeper queues, and fine-grained user control. NVMe could be built with traditional flash or a newer technology. While flash-based NVMe products are currently coming to market, a more speculative technology may increase the lifespan of NVMe devices and up-end the storage hierarchy.

Upgrading the D in DRAM

There is research into new storage media with improved characteristics relative to flash, such as nanosecond access times (vs. microseconds), millions of erasures (vs. thousands), and access with memory pointers (vs. block access with read/writes). These technologies are a new alphabet soup of acronyms to learn: Phase Change Memory (PCM), Spin Transfer Torque (STT), Resistive RAM (RRAM), Carbon Nanotube RAM (NRAM), and others.

I won’t go through all of the details of these techniques, but it is enough to know that these are non-volatile storage media that fall somewhere between DRAM and flash for most characteristics. There has been work on these media for decades, but perhaps we are reaching the point where a mature product is ready for customers.

Intel and Micron leaped into the middle of the NVM space when they announced the 3D XPoint product for both NVMe and NVDIMMS. Unlike previous university and corporate research projects, the companies described roadmaps for the next few years. There have been enough internal meetings and vendor demonstrations, that 3D XPoint looks real.

The implementation details are under wraps, but the best guess is that it is a variant of PCM. Accessed via NVMe, 3D XPoint largely solves the lifespan issues with flash by increasing the number of erasure cycles by several orders of magnitude. Used within NVDIMMS, it has the potential to extend server memory from gigabytes to terabytes. Imagine running multi-terabyte databases in memory. Imagine a storage system where flash is the slow media for colder content. Imagine compute where all data is available with a pointer instead of a read/write call that waits millions of CPU cycles to complete. Turning these imaginations into reality is the hard part.

I need to provide a quick disclaimer: the terminology in this community can be misleading. In my opinion, any product labeled “non-volatile memory (NVM)” should 1. be non-volatile (i.e. hold its value across power events), and 2. be memory (i.e. accessed by a software pointer, not a read/write call at a block size). Sadly, there are products with the NVM label that are both volatile and read/written in large blocks. I am (not) sorry to be pedantic, but that makes me angry and confuses customers.

Not the End of Storage

Do NVMe and NVDIMM products eliminate the need for clever storage design? Thankfully no, or I would be out of a job. While a new media dramatically decreases the latency of data access, there are whole new storage architectures to figure out and programming models to design. The programming complexities of NVDIMMs are an area of active research because data persistence is not immediate. A flush command is needed to move data from processor caches to NVDIMMs, which introduces its own delay.

Also, we are used to the idea of rebooting a computer if it is acting odd. That has traditionally cleared the memory and reloaded a consistent state from storage. Now, the inconsistent state in memory will still be there after a reboot. We’ll need to figure out how to make systems behave with these new characteristics.

Besides these persistence issues, we will need to explore the right caching/tiering model in combination with existing media. While more data can stay in persistent memory, not all of it will. Furthermore, we’ll still need to protect and recover the data, regardless of where the primary copy resides.

I don’t know the right answers to these problems, but we will find out. Considering how flash has upended the market, I can’t wait to see the impact of new non-volatile memory technologies under development. I know that, regardless of how we answer the questions, the storage world will look different in a few years.

~Philip Shilane @philipshilane

How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory Part II

How I Learned to Stop Worrying and Love New Storage Media:   The Promises and Pitfalls of Flash and New Non-volatile Memory Part II

Flash Tomorrow

While my early work in flash caching focused on the flash devices we currently have, the next stage of my research looks at where flash is going.

A combination of customer needs and vendor competition is driving flash towards higher capacity and lower cost, through a variety of technical advancements. When I recently purchased a 128GB SD card for a digital camera, I was reminded of the now-useless 64MB cards I had purchased a decade ago.

A similar trend exists at the enterprise and cloud storage level. While a 100GB flash device was standard years ago, a prototype 60TB flash device was announced at the Flash Memory Summit in August. No price was announced, but I would guess it will cost over $10,000, which sounds high, but at potentially $0.20 per gigabyte, it will be among the cheapest flash devices, per gigabyte. I don’t need this product in my home, but it will find a role in enterprise or cloud storage. How did flash technology evolve to the point where a 60TB flash device is a plausible product?

Flash Trends

There are three main ongoing technical trends driving flash technology. All of the trends focus on increasing the density of flash storage. First, manufacturing processes continue to squeeze more bits into the same chip, which is analogous to denser hard drive capacities and more CPU cores.

Until recently, advancement in density has been within the two dimensional plane of flash chips. Now however, researchers are pursuing a second trend – to stack flash chips within a device, which is conveniently called 3D stacking. Vendors are stacking 48 or more layers already, and there is research pushing to 100 or more layers in the coming years.

The third trend is to increase the information content of individual cells. At a simplistic level, a flash cell is a storage unit for electrons.  Based on the charge of the cell, the value of the cell is either a 0 or a 1.  This was the original design for flash cells called a single-level cell (SLC).  Instead of simply splitting the range of cell charges into two sub-ranges (representing 0 and 1), it is possible to further divide the ranges again to produce four values (0, 1, 2 and 3), called a multi-level cell (MLC). This effectively doubled the capacity of a flash device.  This process of dividing charge ranges continued to three-level cells (TLC) with bit values of 0 through 7, and we are even beginning to see quad-level cells (QLC) with bit values 0 through 15.  QLC flash effectively has 8X the capacity of SLC flash, and flash vendors sell these denser products at cheaper prices per gigabyte.

Impacts of Flash Technology Trends

Unfortunately, there is a downside to packing flash cells more tightly and finely slicing the charge ranges. Recording a value in a cell is a slightly destructive process that wears on the components of the cell. This leads to slight fluctuations in a cell’s charge and means the chance of incorrectly identifying a cell’s true value increases.

Flash manufacturers have addressed the risk of storing/reading data incorrectly by adding error correcting technology, but the end result is that flash lifespans are shrinking. Once a cell has been damaged from numerous writes, it can no longer be reliably written again, so the flash device marks it as invalid.

How should you assess the reliability of flash devices? First, you want to think in terms of writes, since that’s what wears out the flash device. You should be able to read unmodified data for as long as you want. Flash devices are designed to spread writes evenly across the cells, so we typically think about the write cycles of the entire device instead of individual cells. Therefore, think of a write cycle as a period where the entire flash capacity is written. Whereas SLC flash could support 100,000 write cycles, MLC flash supports a few thousand, TLC flash around 1,000 and QLC flash perhaps a few hundred.  This leads to the industry using the term writes-per-day to describe flash devices. Writes-per-day is equal to the number of supported write cycles divided by the number of days in the expected lifetime of the device (e.g. 365 days times 5 years). More writes-per-day means that the flash device is less likely to have cells wear out and fail.

Performance characteristics are also changing with denser devices. Reading, writing and erasing are all getting slower. The fastest SLC flash devices could read in just 25 microseconds, write in 300 microseconds and erase in 1.5 milliseconds.  Each time the charge range is divided to cram more logical values into a cell, these operations have gotten slower. In some cases, doubling the bit density has doubled the latency of each operation. Reads have slowed for a number of reasons including writes and background erasures interfering with reads.

A related issue is the throughput to read and write, which is the number of megabytes (or gigabytes) that can be read and written per second. While capacities are doubling, throughput has increased slowly with each generation of products. Flash devices are getting bigger but we can’t access the full space as quickly as before.

We tend to think of flash as blazingly fast, and that is true relative to the latency of reading and writing to hard drives, but flash is slowing down each year. It is an interesting trend of cheaper, denser, slower flash devices coming to market with shorter lifespans.

Storage System Design Changes

The evolution of flash has led to unexpected choices in system design. Sometimes we select smaller, more expensive (per gigabyte) flash devices because they have the lifespan needed for a storage system. In other cases we may select multiple, smaller flash devices over a single, large flash device to get the overall IOPS and throughput needed by a system. Simply picking the newest, biggest flash device isn’t always the right answer.

The evolution of flash also affects how to design smaller storage systems. As flash vendors have emphasized manufacturing larger and larger devices, we have run into situations where they no longer stock low capacity flash devices needed for storage systems targeted at small businesses. Just as some customers had to buy excess hard disk capacity in their storage system (for performance or reliability), the same may be coming for flash storage systems, too.

As a result, we are starting to consider tiers of flash, where a high performance cache uses SLC flash with the highest endurance and lowest latency while colder data is stored on denser, low endurance flash. Of course, that assumes that flash won’t be supplanted by newer media, itself.

While flash has been the hot media of the last decade and will continue to be heavily leveraged for years to come, there are new storage media coming to market and in the prototype phase. My next blog post will discuss NVMe and new non-volatile DRAM options and the potential to upend not only storage but servers as well.

 

~Philip Shilane @philipshilane

The Hashtag Cortex

The Hashtag Cortex

Escaping the deadly radiation of the tech industry pulsar this time Inside The Data Cortex.

  • This year has been “The Year of all Flash” and Mark didn’t notice.
  • Weeks after day one Stephen and Mark discuss day one. It was kind of like day zero and not much different than day two. But day two had the world’s largest donut at Dell EMC World.
  • Weight gain and not much weight loss at tradeshows.
  • Stephen on the Goldilocks approach to embracing the public cloud and the tyranny of selection bias.
  • Do Google consider themselves an enterprise supplier?
  • This time of year there’s no sunshine anywhere outside of California. Says man living in California.
  • Software Defined Storage is kind of interesting. Says customer who thinks the installation packages will do everything.
  • Scale out is still a hard problem.
  • Mark has looked at home grown storage solutions and sees a lot of ugly babies. (Sorry! He’s not sorry.)
  • The Botnet of Things is real and your dishwasher is hitting someone with a denial of service attack right now.
  • This episode in reading things. Alcatraz Verses the Evil Librarians, Benjamin Franklin: An American Life, Steinbeck’s The Winter of our Discontent, Ken Clarke’s Kind of Blue and Stalin Paradoxes of Power.

No one likes to give up power. Go before you are pushed. Because it will be people like us doing the pushing.


Download this episode (right click and save)

Subscribe to this on iTunes

Get it from Podbean

Follow us on Pocket Casts
Stephen Manley @makitadremel Mark Twomey @Storagezilla

The Origins of eCDM: Team Houston

The Origins of eCDM: Team Houston

Dell EMC recently announced Enterprise Copy Data Management (eCDM), a product that enables global management and monitoring of data copies across primary and protection storage. Perhaps just as interesting as eCDM is the way that the product was conceived, designed, developed, and taken to market. Like the trailblazers of the west, the product team behind eCDM was faced with the daunting challenge of exploring uncharted territory. They created a product from scratch using agile methodologies, open source technology, a brand new UI, and an entirely custom go-to-market strategy.

This is the final post in a series that details the challenges and successes of the product team from conception to release. The first and second posts can be found here and here, respectively.

Designing and developing a product such as eCDM is a major feat, but it’s really only half of the story when discussing emerging products. The other half of the challenge is taking the product to market. In a company that is so accustomed to acquiring new technology, the eCDM product team needed to address the rarely encountered challenge of creating a go-to-market strategy from scratch. However, starting from nothing has its advantages. The eCDM team’s strategy began with using customer feedback differently. The development team continuously and directly addressed customer feedback during design and implementation to build a more valuable and usable product.

In order to anticipate the needs of the customer, a specialized team of quality engineering experts called Team Houston was created to provide the customer voice to the engineering team. The group quickly became a core component of the eCDM go-to-market strategy by performing end to end testing and developing a deep understanding of the product from a customer viewpoint. David Sandock, a senior member of Team Houston, explained that his team provides a unique perspective of the product by testing features as if he and his team were actual customers. “We concentrate on areas that we anticipate to be most frequented by the end user and then we try to break them,” David said while discussing the role of his team. Unlike traditional engineering or quality teams, most members of Team Houston are not on scrum teams; as a result, they are free to explore and test any part of the product.

Team Houston also plays a major role in gathering and executing on customer feedback. David and the team helped facilitate the hosted beta. They took their experiences directly back to the engineering team, removing the pesky barrier between the engineer and the customer. Through this model and the diligence of Team Houston, the engineering team addressed nearly all of the customer feedback from the hosted beta through bug fixes, new features, or user experience improvements. “We’re listening,” David told me, and the engineering team is certainly taking action to address what Team Houston hears.

The product management team is also listening closely to customer feedback. Robert Hammond, the go-to-market product manager for eCDM, spends nearly 40% of his time in front of customers to learn and help solve the challenges that they face with data protection. As Rob puts it, “I don’t learn anything in my office. The only way to learn what we should be building is to interact with the customer.” With emerging products, it becomes increasingly important to understand deeply the problems that customers are facing in order to address them.

eCDM was designed from customer feedback, and future product iterations will continue to reflect countless customer conversations to properly address customer needs. Rob and the product management team have spent countless hours understanding customer needs and developing requirements that address many of the challenges that customers face with self-service data protection. “What we’ve built has come directly from listening to our customers,” Rob said, “and we’ll continue to listen in order to address the needs of our customers.”

As eCDM evolves, it is fueled by customer input. Team Houston and the Product Management team have done an exceptional job of taking a new product to market in a way that leverages customer interaction to improve eCDM. Team Houston plays an instrumental role in engaging the development team with customer feedback, and the product management team has included many features based directly on customer input. Together, these teams enable eCDM to enter the market as a well-defined and user-focused product.

David Sandock has spent the last 9 years working in the Israeli tech scene, having spent the last 3 years with EMC as a Senior Software Quality Engineer working for RecoverPoint. He recently relocated to the US to take a role with Dell EMC’s eCDM product.  He has been primarily responsible for customer focused end-to-end testing, while being a focal point for the different storage technologies used in testing the eCDM product and as a direct link between testing teams and the product management team.

Robert Hammond is a product manager on the eCDM Team. He is focused on helping customers successfully adopt eCDM while sharing what he has learned from customers with the rest of the product & engineering teams.  Prior to this role, Robert held various product, marketing and pre-sales roles at Dell, Amazon and a few startups.

 

~Tyler Stone @tyler_stone_

Honoring Dell EMC’s Core Technologies Technical Directors

Honoring Dell EMC’s Core Technologies Technical Directors

In the modern business world, executives get all the external recognition. It’s just a few weeks into the Dell acquisition of EMC, and most people already know names like: Marius Haas, Jeff Clarke, David Goulden, Jeff Clarke, Howard Elias, and Rory Read. Some of them even have their own Wikipedia pages.

A company like Dell EMC, however, cannot succeed without people who design, build, and ship the products that the executives talk about. Therefore, every quarter, we recognize the newest Dell EMC Core Technologies Technical Directors. These are senior technical leaders who have delivered a sustained business impact by delivering – products, solutions, and customer experience. They are the engine of the company. The previous recipients are detailed here.

Of course, Core Technologies continues to deliver innovative solutions, so we continue to expand the roster of Technical Directors. This quarter I’m pleased to announce:

Frederic Corniquet – Frederic has been a leader in the NAS protocols for the midrange systems for over a decade. Frederic has been a driving force in EMC’s NAS offerings growing in both technical strength and customer adoption from VNX1 to VNX2 to Unity. Frederic’s expertise extends from the NAS protocol to security to integrating with VMware for NFS data stores. As a leader in EMEA, Frederic also evangelizes and connects with some of our biggest customers. Frederic is a technical leader, evangelist, and expert who is growing EMC’s NAS business.

Rajesh Nair – Rajesh has been a leader in NetWorker for over a decade, focused on solving our largest customers’ most difficult backup challenges. He began by working on image-level backups (SnapImage) to solve customers’ large file system backup challenges. He then delivered NetWorker’s NDMP tape solution, solving customers’ large NAS file system backup challenges. Rajesh then led the team to integrate Data Domain BOOST into NetWorker, which solves performance and networking scaling challenges. Today more than half of NetWorker customers leverage BOOST. Rajesh’s decade of innovation, delivery, and leadership have driven NetWorker to be the customers’ choice for the most difficult backup challenges.

Tom Papadakis – Applications. IT teams want to speak to their application owners. Tom has led application-centric data protection for almost 20 years. Tom began by making NetWorker indexes scale for application backup. Then he developed NetWorker’s Oracle integration which allows DBA and backup administrator to work independently, while retaining centralized control. Tom also brings a customer and sales-centric viewpoint to application protection. He spearheaded the creation of NMDA – a package that combines the application support for multiple applications. The result was a dramatically improved total customer experience. As application protection spans across all of data protection, Tom has also brought together Data Domain (via DDBEA integration) and Avamar. Application protection is the present and future of backup and Tom has been at the front of that mission.

Ian Wigmore – Ian specializes in making products run fast. He began in the Symmetrix Microcode group connecting the Symmetrix to IBM’s S/390 and Z-series mainframes via the FC-2 software layers in the FICON storage director. To say that mainframe applications and users are sensitive to latency is an understatement. Ian’s performance tuning helped the Symmetrix (now VMAX) be the storage of choice for the most demanding mainframe applications. Ian was one of the initial leaders on ProtectPoint for VMAX – a product that improves backup times from VMAX to Data Domain by up to 20x. This has uniquely solved the challenge of large database backups. Fast and simple – whether it is mainframe, migration, or backup – Ian’s work separates EMC technology from the competition.

Dorota Zak – Flexibility of choice. Dorota protects customers’ core applications regardless of the tool. While she began in NetWorker, Dorota was instrumental to expanding Avamar’s application support, adding Sybase and SAP support, which helped solve backup challenges for many enterprises. Dorota then created the framework for DDBEA (which enables application admins to protect their data directly to Data Domain, without using backup software), so that it could quickly support the key 6 databases. That then extended to supporting key applications for Dell EMC’s industry leading ProtectPoint product (XtremIO and VMAX backing up directly to Data Domain). Dorota even helped SAP design the appropriate backup APIs for SAP HANA; EMC now leads the industry in protecting SAP HANA. Dorota’s work enables our customers to protect their applications however they like.

Dell EMC executives are among the industry’s best. They set our direction and guide the organization through unprecedented company and market changes. Our technical leaders, however, are the absolute best in the industry. It’s always easier to lead when you have the best and most talent. These Dell EMC Core Technologies Technical Directors are just some of the technical talent who deliver the infrastructure solutions that run the world.

~Stephen Manley @makitadremel

How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory

How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory

I tried to avoid learning about flash. I really did.  I’ve never been one of those hardware types who constantly chase the next hardware technology.  I’d rather work at the software layer, focusing on data structures and algorithms.  My attitude was that improving hardware performance raises all boats, so I did not have to worry about the properties of devices hiding under the waves.  Switching from hard drives to flash asthe common storage media, would just make everything faster, right?

Working for a large storage company broke me out of that mindset, though I still fought it for a few years. Even though I mostly worked on backup storage systems–one of the last hold-outs against flash–backup storage began to see a need for flash acceleration.   I figured we could toss a flash cache in front of hard drive arrays, and the system would be faster.  I was in for a rude awakening.   This multi-part blog post outlines what I have learned about working with flash in recent years as well as my view on the direction flash is heading.  I’ve even gotten so excited about the potential of media advances that I am pushing myself to learn about new non-volatile memory devices.

Flash Today

For those unfamiliar with the properties of flash, here is a quick primer. While a hard drive can supply 100-200 read/write operations per second (commonly referred to as input/output operations per second or IOPS), a flash device can provide 1000s – 100,000s IOPS.  Performing a read or write to a hard drive can take 4-12 milliseconds, while a flash device can typically respond in 40-200 microseconds (10-300X faster).  Flash handles more read/writes per second and responds more quickly than hard drives.  These are the main reasons flash has becoming widespread in the storage industry, as it dramatically speeds up applications that previously waited on hard drives.

If flash is so much faster, why do many storage products still use hard drives? The answer: price.  Flash devices cost somewhere in the range of $0.20 to $2 per gigabyte, while hard drives are as inexpensive as $0.03 per gigabyte.  For a given budget, you can buy an order of magnitude more hard drive capacity than flash capacity.  For applications that demand performance, though, flash is required.  On the other-hand, we find that the majority of storage scenarios follow an 80/20 rule, where 80% of the storage is cold and rarely accessed, while 20% is actively accessed.  For cost-conscious customers (and what customer isn’t cost conscious?), a mixture of flash and hard drives often seems like the best configuration.  This leads to a fun system design problem.  How do we combine flash devices and hard drives to meet customer requirements?  We have to meet the IOPS, latency, capacity and price requirements of varied customers.  The initial solution is to add a small flash cache to accelerate some data accesses while using hard drives to provide a large capacity for colder data.

A customer requirement that gets less attention, unfortunately, is lifespan. This means that a storage system should last a certain number of years without maintenance problems, such as 4-5 years.  While disk drives fail in a somewhat random manner each year, the lifespan of flash is more closely related to how many times it has been written.  It is a hardware property of flash that storage cells have to be erased before being written, and flash can only be erased a limited number of times.  Early flash devices supported 100,000 erasures, but that number is steadily decreasing to reduce the cost of the device.   For a storage system to last 4-5 years, the flash erasures have to be used judiciously over that time.   Most of my own architecture work around using flash has focused on the issues of maximizing the useful data available in flash, while controlling flash erasures to maintain its lifespan.

The team I have been a part of pursued several approaches to best utilize flash. First, we tried to optimize the data written to flash. We cached the most frequently accessed portions of the file system, such as index structures and metadata that are read frequently.   For data that changes frequently, we tried to buffer it in DRAM as much as possible to prevent unnecessary writes (and erasures) to flash.  Second, we removed as much redundancy as possible.  This can mean deduplication (replacing identical regions with references), compression and hand-designing data structures to be as compact as possible.  Enormous engineering effort goes into changing data structures to be flash-optimized.  Third, we sized our writes to flash to balance performance requirements and erasure limits.  As writes get larger, they tend to become a bottleneck for both writes and reads.  Also, erasures decrease because the write size aligns with the internal erasure size (e.g. multiple megabytes).  Depending on the flash internals, the best write size may be tens of kilobytes to tens of megabytes.  Fourth, we created cache eviction algorithms specialized for internal flash erasure concerns.  We throttled writes to flash and limited internal rearrangements of data (that also cause erasures) to extend flash lifespan.

Working with a strong engineering team to solve these flash-related problems is a recent highlight of my career, and flash acceleration is a major component of the 6.0 release of Data Domain OS.  Besides working with engineering, I have also been fortunate to work with graduate students researching flash topics, which culminated in three publications.  First we created Nitro, a deduplicated and compressed flash cache.  Next, Pannier is a specially-designed flash caching algorithm that handles data with varying access patterns.  Finally, we wanted to compare our techniques to an offline-optimal algorithm that maximized cache reads while minimizing erasures.  Such an algorithm did not exist, so we created it ourselves.

In my next blog post, I will present technology trends for flash. For those that can’t wait, the summary is “bigger, cheaper, slower.”

~Philip Shilane @philipshilane

Data is the currency of the 21st century

Data is the currency of the 21st century

For the last 20 years I’ve helped clients design and build Enterprise IT applications and infrastructure to support their businesses. During this time, my focus was on ensuring that data and the business processes IT supports are always available, under any conditions, planned or unplanned.

 

Recently, my focus changed. Last month I joined the Chief Technology Office inside the Information Solutions Group. For those not familiar, the Infrastructure Solutions Group is responsible for Dell EMC’s Primary Storage and Data Protection Solutions. If you have heard of VMAX, VNX, XtremIO, Unity, Data Domain, NetWorker or Avamar, you are in good company.

 

I’ve shifted from data infrastructure to exploring the value one can derive from data. In my quest to find answers, I have buried my head in research on many topics: conventional business intelligence, data visualization, statistics, machine learning, probabilistic programming and artificial neural networks. Suffice to say it is a lot to take on in a few months (and my head still hurts). What I have come to learn is that data generated by machines, in the course of doing business and interacting with each other, contains value that is seemingly invisible to the human eye. That value needs to be explored and unlocked.

 

Since the beginning of time, the human race has collected data to build a plan and support decisions. Data would be collected from surveys, forms, experiments, tests, optical devices and analogue to digital converters. This data was used to understand the environment around us, to find meaning, and to support objective decision making. However, the amount and diverse nature of the data we could generate was limited by the instruments and methods available to describe the environment in terms computers could process.

 

What changed? Thanks to mobile phones and other portable devices, the world switched from analogue to digital communications.

 

According to IDC, the digital universe we now live in is doubling in size every two years, and by 2020 the data we create and copy annually will reach 44 trillion gigabytes.

 

This digital revolution is having another profound effect. It is fueling a resurgence in Artificial Intelligence.

 

With data streaming into Artificial Intelligent systems from sight (images), sound (microphone) and touch (screen) sensors, we can now understand what people are doing, feeling, saying and seeing. And by viewing the environment from multiple perspectives, Artificial Intelligence can build a greater and more accurate understanding of the environment. Consider a little experiment. Next time you speak to someone, close your eyes and try to imagine how they feel. Are they happy, are they sad, are they calm, are they angry? The human brain can associate emotion from speech, the same way we associate objects through the visual cortex. Now, perform the same experiment, but this time with your eyes open. Are you more confident in your answer now?

 

What may not be obvious is that one of the main ingredients driving the resurgence in Artificial Intelligence is the abundance of rich and diverse data. This form of Artificial Intelligence, better known as Deep Learning, requires access to large amounts of data and computational power to identify patterns and form opinions. Humans then take these opinions to support their motivations and actions. The motivation behind the learning can be anything. For example, a business may want to identify growth opportunities, optimize a system or process, or anticipate and prevent a bad situation.

 

Major organizations including IBM, Google, Facebook, Amazon, Baidu and Microsoft recognize the potential of Artificial Intelligence, and each are accumulating data at massive scale. Much of this data comes in the form of speech, images and text from digital sources such as social media platforms, mobile devices and IoT sensors. What was once seemingly invisible to a human being can now be observed from the data.

 

What does the future hold?

 

Data rich organizations and governments will be able to do a lot of good for society. They will be able to solve crimes, identify disease early, prevent the spread of disease, anticipate social unrest and reduce our dependency on fossil fuels. However, they will also be able to anticipate and take advantage of events before everyone else, such as economic conditions, domestic trends and worldwide movements.

 

If you are in the business of delivering products and services, consider the type of data you need and questions you could ask to obtain a complete understanding of your products and users.

 

If you are engaging with customers and partners, consider the type of data you need and questions you could ask to improve customer satisfaction, simplify interactions, increase efficiency and anticipate next steps.

 

To summarize, data is the currency of the 21st century brought on by the digital revolution. This is fueling an Artificial Intelligence arms race. Those that ask questions of the data and listen will thrive. Those that ignore the data will fall further behind. To remain relevant in this digital era we all need to produce, nurture and listen to the data.

~Peter Marelas @pmarelas