How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory

How I Learned to Stop Worrying and Love New Storage Media: The Promises and Pitfalls of Flash and New Non-volatile Memory

I tried to avoid learning about flash. I really did.  I’ve never been one of those hardware types who constantly chase the next hardware technology.  I’d rather work at the software layer, focusing on data structures and algorithms.  My attitude was that improving hardware performance raises all boats, so I did not have to worry about the properties of devices hiding under the waves.  Switching from hard drives to flash asthe common storage media, would just make everything faster, right?

Working for a large storage company broke me out of that mindset, though I still fought it for a few years. Even though I mostly worked on backup storage systems–one of the last hold-outs against flash–backup storage began to see a need for flash acceleration.   I figured we could toss a flash cache in front of hard drive arrays, and the system would be faster.  I was in for a rude awakening.   This multi-part blog post outlines what I have learned about working with flash in recent years as well as my view on the direction flash is heading.  I’ve even gotten so excited about the potential of media advances that I am pushing myself to learn about new non-volatile memory devices.

Flash Today

For those unfamiliar with the properties of flash, here is a quick primer. While a hard drive can supply 100-200 read/write operations per second (commonly referred to as input/output operations per second or IOPS), a flash device can provide 1000s – 100,000s IOPS.  Performing a read or write to a hard drive can take 4-12 milliseconds, while a flash device can typically respond in 40-200 microseconds (10-300X faster).  Flash handles more read/writes per second and responds more quickly than hard drives.  These are the main reasons flash has becoming widespread in the storage industry, as it dramatically speeds up applications that previously waited on hard drives.

If flash is so much faster, why do many storage products still use hard drives? The answer: price.  Flash devices cost somewhere in the range of $0.20 to $2 per gigabyte, while hard drives are as inexpensive as $0.03 per gigabyte.  For a given budget, you can buy an order of magnitude more hard drive capacity than flash capacity.  For applications that demand performance, though, flash is required.  On the other-hand, we find that the majority of storage scenarios follow an 80/20 rule, where 80% of the storage is cold and rarely accessed, while 20% is actively accessed.  For cost-conscious customers (and what customer isn’t cost conscious?), a mixture of flash and hard drives often seems like the best configuration.  This leads to a fun system design problem.  How do we combine flash devices and hard drives to meet customer requirements?  We have to meet the IOPS, latency, capacity and price requirements of varied customers.  The initial solution is to add a small flash cache to accelerate some data accesses while using hard drives to provide a large capacity for colder data.

A customer requirement that gets less attention, unfortunately, is lifespan. This means that a storage system should last a certain number of years without maintenance problems, such as 4-5 years.  While disk drives fail in a somewhat random manner each year, the lifespan of flash is more closely related to how many times it has been written.  It is a hardware property of flash that storage cells have to be erased before being written, and flash can only be erased a limited number of times.  Early flash devices supported 100,000 erasures, but that number is steadily decreasing to reduce the cost of the device.   For a storage system to last 4-5 years, the flash erasures have to be used judiciously over that time.   Most of my own architecture work around using flash has focused on the issues of maximizing the useful data available in flash, while controlling flash erasures to maintain its lifespan.

The team I have been a part of pursued several approaches to best utilize flash. First, we tried to optimize the data written to flash. We cached the most frequently accessed portions of the file system, such as index structures and metadata that are read frequently.   For data that changes frequently, we tried to buffer it in DRAM as much as possible to prevent unnecessary writes (and erasures) to flash.  Second, we removed as much redundancy as possible.  This can mean deduplication (replacing identical regions with references), compression and hand-designing data structures to be as compact as possible.  Enormous engineering effort goes into changing data structures to be flash-optimized.  Third, we sized our writes to flash to balance performance requirements and erasure limits.  As writes get larger, they tend to become a bottleneck for both writes and reads.  Also, erasures decrease because the write size aligns with the internal erasure size (e.g. multiple megabytes).  Depending on the flash internals, the best write size may be tens of kilobytes to tens of megabytes.  Fourth, we created cache eviction algorithms specialized for internal flash erasure concerns.  We throttled writes to flash and limited internal rearrangements of data (that also cause erasures) to extend flash lifespan.

Working with a strong engineering team to solve these flash-related problems is a recent highlight of my career, and flash acceleration is a major component of the 6.0 release of Data Domain OS.  Besides working with engineering, I have also been fortunate to work with graduate students researching flash topics, which culminated in three publications.  First we created Nitro, a deduplicated and compressed flash cache.  Next, Pannier is a specially-designed flash caching algorithm that handles data with varying access patterns.  Finally, we wanted to compare our techniques to an offline-optimal algorithm that maximized cache reads while minimizing erasures.  Such an algorithm did not exist, so we created it ourselves.

In my next blog post, I will present technology trends for flash. For those that can’t wait, the summary is “bigger, cheaper, slower.”

~Philip Shilane @philipshilane

Data is the currency of the 21st century

Data is the currency of the 21st century

For the last 20 years I’ve helped clients design and build Enterprise IT applications and infrastructure to support their businesses. During this time, my focus was on ensuring that data and the business processes IT supports are always available, under any conditions, planned or unplanned.


Recently, my focus changed. Last month I joined the Chief Technology Office inside the Information Solutions Group. For those not familiar, the Infrastructure Solutions Group is responsible for Dell EMC’s Primary Storage and Data Protection Solutions. If you have heard of VMAX, VNX, XtremIO, Unity, Data Domain, NetWorker or Avamar, you are in good company.


I’ve shifted from data infrastructure to exploring the value one can derive from data. In my quest to find answers, I have buried my head in research on many topics: conventional business intelligence, data visualization, statistics, machine learning, probabilistic programming and artificial neural networks. Suffice to say it is a lot to take on in a few months (and my head still hurts). What I have come to learn is that data generated by machines, in the course of doing business and interacting with each other, contains value that is seemingly invisible to the human eye. That value needs to be explored and unlocked.


Since the beginning of time, the human race has collected data to build a plan and support decisions. Data would be collected from surveys, forms, experiments, tests, optical devices and analogue to digital converters. This data was used to understand the environment around us, to find meaning, and to support objective decision making. However, the amount and diverse nature of the data we could generate was limited by the instruments and methods available to describe the environment in terms computers could process.


What changed? Thanks to mobile phones and other portable devices, the world switched from analogue to digital communications.


According to IDC, the digital universe we now live in is doubling in size every two years, and by 2020 the data we create and copy annually will reach 44 trillion gigabytes.


This digital revolution is having another profound effect. It is fueling a resurgence in Artificial Intelligence.


With data streaming into Artificial Intelligent systems from sight (images), sound (microphone) and touch (screen) sensors, we can now understand what people are doing, feeling, saying and seeing. And by viewing the environment from multiple perspectives, Artificial Intelligence can build a greater and more accurate understanding of the environment. Consider a little experiment. Next time you speak to someone, close your eyes and try to imagine how they feel. Are they happy, are they sad, are they calm, are they angry? The human brain can associate emotion from speech, the same way we associate objects through the visual cortex. Now, perform the same experiment, but this time with your eyes open. Are you more confident in your answer now?


What may not be obvious is that one of the main ingredients driving the resurgence in Artificial Intelligence is the abundance of rich and diverse data. This form of Artificial Intelligence, better known as Deep Learning, requires access to large amounts of data and computational power to identify patterns and form opinions. Humans then take these opinions to support their motivations and actions. The motivation behind the learning can be anything. For example, a business may want to identify growth opportunities, optimize a system or process, or anticipate and prevent a bad situation.


Major organizations including IBM, Google, Facebook, Amazon, Baidu and Microsoft recognize the potential of Artificial Intelligence, and each are accumulating data at massive scale. Much of this data comes in the form of speech, images and text from digital sources such as social media platforms, mobile devices and IoT sensors. What was once seemingly invisible to a human being can now be observed from the data.


What does the future hold?


Data rich organizations and governments will be able to do a lot of good for society. They will be able to solve crimes, identify disease early, prevent the spread of disease, anticipate social unrest and reduce our dependency on fossil fuels. However, they will also be able to anticipate and take advantage of events before everyone else, such as economic conditions, domestic trends and worldwide movements.


If you are in the business of delivering products and services, consider the type of data you need and questions you could ask to obtain a complete understanding of your products and users.


If you are engaging with customers and partners, consider the type of data you need and questions you could ask to improve customer satisfaction, simplify interactions, increase efficiency and anticipate next steps.


To summarize, data is the currency of the 21st century brought on by the digital revolution. This is fueling an Artificial Intelligence arms race. Those that ask questions of the data and listen will thrive. Those that ignore the data will fall further behind. To remain relevant in this digital era we all need to produce, nurture and listen to the data.

~Peter Marelas @pmarelas

Build-or-Buy? Why M&A exists

Build-or-Buy? Why M&A exists


EMC (now Dell EMC) has evolved from a company that sold office furniture to a successful collection of strategically aligned technology businesses selling a variety of information solutions. If we look in the rearview mirror, we find many companies that were successful selling one product, like Prime or selling into one market, like Wang. Both of those companies and many others faltered and disappeared when the market for their product was disrupted by better technology or when business requirements changed. So what’s different at Dell EMC you ask?


Dell EMC’s stamina, growth and success result from a combination of home-grown ‘organic’ development in combination with strategic investments, and ‘inorganic’ key acquisitions. Deciding whether to build or buy is a complex decision, but like all good decisions it begins with a challenging question.


“What do we have and what do we need?”

Gap analysis – What do we need?

The build or buy journey starts by taking an objective look at your current products and their trajectories. Dell EMC has teams of very bright folks that look at existing markets and new evolving markets.  We analyze where our solutions win and where they do not. Where are we strongest and where do we need help?  Can we morph existing products to fill the gaps?  How long will it take to move into a new or adjacent market, and what resources are required?


We can’t fill every gap organically <Dell EMC employees rant here>, so we invest in and sometimes acquire external companies. Our M&A teams target adjacent growth markets and disruptive technologies. When in doubt, disrupt thyself.  Obvious examples of disruptions and evolutions in the last 10 years include meeting the requirements of large file applications such as media, genome sequencing and mapping. Equally challenging are extreme databases and analytic performance addressing 1000’s of VDI users and hundreds of stock analysts. Future posts will look at two of Dell EMC’s most successful acquisitions. But first let’s examine organic development:


“Why does Dell EMC spend so much to buy new companies? Couldn’t we build it better and at lower cost?”


If you build it, they will come.


Time to market is a key component of a winning strategy. Lean and agile, startups forge ahead into new fields and technologies with great enthusiasm, vigor and laser focus.  They seek to become bleeding-edge adopters of new technology.  They seek to go where no company has gone before. Often, startups are created by teams who don’t know that what they’re doing is impossible. It’s important to look at the differences between a young startup and an existing market provider.


If you are an existing provider you must answer:


“How can we staff this new project without impacting existing revenue and growth rates?”

“How and where should we invest in cutting edge technology?“

“How can we grow new teams with new skills while maintaining our edge in current products?“


If you are a startup, you simply GET TO WORK!


Startups don’t have existing products or customers and thus have no legacy products to service or support. Startups don’t need to maintain existing product lines while they invent new ones. Startups have the freedom to focus 100% of their investment in disruptive or future technologies in order to succeed. Here is a good article from the Harvard Business Review on innovation budget. Startup CEO’s attract the brightest and most motivated people by dangling potential Multi-Millionaire status and offering them rides in their Tesla.


Industry secret: (Please don’t tell) Startups are held to a lower standard than existing enterprise companies. Startup concepts are expected to be full of holes if you look closely enough.  If their idea or prototypes are remarkable and differentiated, a unique and valuable startup will often get away with being ‘good enough.’ This is also why companies like Dell EMC must invest huge amounts of money in startups after they are purchased.


Mergers and acquisitions are part of life in a successful, long-lasting company. M&A cannot replace internal innovation, but neither can any company expect to survive purely based on organic investment.  M&A is just as strategic and innovative as anything else in Dell EMC, and we take it just as seriously.


Stay Tuned: Build-or-Buy Part 2: How to turn investments and acquisitions into billions in incremental revenue. Just kidding  competitors, nothing to see here.

Real Part 2: How to be a startup that gets Dell EMC’s attention yet protects itself from the shark-infested waters of big companies.

~Mike Fishman @mike_fishman

Love in the Data Center

Love in the Data Center

I love the data center.

I’ve heard the responses, too:

  • “You have to say that. Dell EMC, and for all your ‘cloud’ talk, is still a data center company.”
  • “Sure, and you also love cassette tapes, flip phones, and encyclopedias.”
  • “How long until you tell the kids to get off your lawn?”

First, I can love both the data center and the cloud. Second, never tell me I love tape. Third, I don’t have kids on my lawn because I can’t get rid of the skunks.

I love the data center because I’m not convinced that everything will standardize and I think technology still matters.

Everything is Not Standard

IT infrastructure won’t standardize because applications and governments won’t standardize. CIOs use the “electricity” analogy when talking about the data center. They want to consume IT as a commodity. The first problem with that aspiration – application developers.

Application developers block standardization because they are the “talent” in modern businesses. Every business is pivoting to become a technology company – e.g. Tesla is “a sophisticated computer on wheels” and lightbulb companies have become software companies for lighting. Therefore, application developers become business critical talent. When you’re the “talent”, you get what you ask for (witness every professional sports athlete). This is especially true in organizations where nobody really understands what they do. When an application developer asks for a unique performance profile to support a hot new business application, she will get it. When an application developer needs a nonstandard network configuration, he will get it. When an application developer must store data differently than everybody else, it will happen. Artists are the biggest roadblock to conformity; application developers are the artists in most companies that are becoming “software companies that do X”.

Government compliance regulations will also block standardization. Fifteen years ago, the “highly regulated” IT organizations were in federal government, health care, and financials. Today, every company faces complex regulations. Those regulations vary across countries, and they’re always changing. I’ve met multiple organizations with teams of lawyers who manage regulations – and they still get things wrong. Finally, as businesses span industries and geographies, the compliance expectations can even conflict! In a world where politicians know more about making headlines than they do about technology, standardization can’t happen.

Executives want to treat IT like a commoditized utility. The difference between electricity and IT infrastructure is data. Application developers want to do creative things with data. Government organizations want to regulate data access and retention. As long as the talent and the regulator both expect special treatment, IT-as-a-utility is a myth.

Technology Still Matters

The other nemesis of standardization is innovation. If users demand something different, but everything is effectively the same, then there is little value in trying to bypass standards. In our industry, however, things are still changing rapidly. Hardware innovation drives software innovation, and we’re in the midst of relentless hardware turnover.

The storage media upheaval seems to be accelerating. Ten years ago, Data Domain declared that “Tape is Dead”. Dell EMC declared 2016 to be “The Year of All Flash” for primary storage. Many IT organizations think this is a time to take a deep breath because history says it will be another decade before the media shifts again. I think the next disruption will begin in the next 3 years, not in a decade. The “Disk is Dead” (All-Flash Protection Storage) and “Non-Volatile Memory” (where I/O moves closer to the application) revolutions are coming.

Analytics has also transformed companies’ relationship to IT infrastructure. Successful organizations mine as much information as they can – about their customers, their teams, their processes, and their interactions. When running analytics, the most important ingredient is – DATA. What data are people accessing? What region is leveraging different services? What applications are getting the most load at different times of day? Good companies ask how they can improve every aspect of their business. Great companies answer those questions with concrete data, and then take action. How often have you wanted to know more detail on what was happening somewhere? When you cede control of your IT infrastructure, you lose access to its telemetry. In a world where data and metadata are your most precious assets, why let somebody else have them?

Technology still matters. Performance, cost, scale, and functionality can change in a matter of months. Those changes can mean the difference between launching an application and failing to meet the ROI goals. Meanwhile, analytics enables businesses to better understand how they run internally and connect with customers externally, on a global scale.

To control your destiny, sometimes you need to control your IT infrastructure.


I love the data center. I love building products that power the data center. It has provided the infrastructure for unparalleled growth and invention around the world. Obviously, we need to simplify the data center technologies – to enable our users to deliver value more quickly to their customers. We need to ensure that we’re not simply adding value-free features or products. But the data center is here to stay.

With all that, I love the cloud. There is enormous value in standardized cloud applications and infrastructure. It’s a great way to develop, explore, and scale. It’s ideal for a variety of applications. But with all the “prodigal son” love that cloud gets, sometimes it’s important to remind the first son how much you love him.

When you choose to standardize, you settle for the least common denominator. In a world filled with constantly changing demands (developers and regulators) and constantly changing supply (innovation and analytics), are you sure you’re ready to settle?

~Stephen Manley @makitadremel

How To Get Things Done

How To Get Things Done

“How can we get anything done across products?”

That was the theme of the 2016 EMC Core Technologies Senior Architect Meeting. Every year, we gather the senior technical leaders to discuss market directions, technology trends, and our solutions. This year included evolving storage media, storage connectivity, Copy Data Management, Analytics, CI/HCI, Cloud, and more. While the technical topics generated discussion and debate, the passion was greatest around – “How can we get anything done across products?” Each Senior Architect got to their position by successfully driving an agenda in their product groups, so they find their lack of cross-product influence to be exceptionally frustrating.

While the challenge may sound unique to senior leaders in a large organization, it’s a variant of the most common question I get from techies of all levels across all companies: “How can I get things done?”

What’s the Value?

Engineers – if your idea does not either generate revenue or save cost, you’re going to have a difficult time generating interest from business leaders, sales, and customers. Everybody loves talking about exciting technology, but they pay for solutions to business problems.  Too often, engineers propose projects that customers like, but would not pay for.

An internal team once proposed a project that would make our UI look “cooler”. I asked what it would do for the customer. It wouldn’t eliminate a user task. It wouldn’t help them get more done. But they were convinced it would be more “fun” which would convince more enterprises to buy the product. Not surprisingly, we didn’t pursue that project.

I recently met a startup with very exciting technology, but I couldn’t see how/why anybody would pay for it. The founder looked me in the eye and said, “People will love it so much, that they’ll just send me checks in the mail. But I’ll only cash the big ones, since smaller companies shouldn’t have to pay.” I started laughing at his joke, then felt really guilty(OK, sort of guilty) when I realized he was serious.

As you think about your value, it’s preferable to focus on revenue generation. Customers and executives would rather invest in solutions that increase their revenue rather than those that save costs. Cost saving discussions are either uncomfortable (and then you lay off ‘n’ people) or hard to justify (if you spend a lot of money today, you’ll save even more… in three years ). On the other hand, everybody likes talking about generating new revenue.

My Executive Briefing Center sessions often come after either Pivotal or Big Data discussions. The customers are excited about CloudFoundry, analytics, and new development techniques because it allows them to more quickly respond to customers and generate new revenue streams. As I walk in, they’re excitedly inviting the Pivotal presenter to dinner. After I discuss backup or storage, they say, “Thanks, this should help us reduce our costs. We still wish it weren’t so expensive, though.” Oh, and they NEVER invite me to dinner. Because nobody likes the “cost cutting” person. Or nobody likes me. Either one.

What are the Alternatives?

Technical people tend to make three mistakes when pitching an idea.

Mistake 1: Leading the audience through your entire thought process.

First, most senior people don’t have the attention span (I blame a day full of 30 minute meetings) to wait for your conclusion. Quickly set context, then get to the conclusion. Be prepared to support your position, but let them question you, don’t pre-answer everything. Second, most people don’t problem solve the same way you do, so your “obvious” thought path may not be clear to others. Finally, the longer you talk, the less likely you are to have a conversation. Your audience wants to be involved in a decision; that only happens when they can express their viewpoint and know that you’ve understood it.

Mistake 2: Not presenting actions

Let’s say you’ve made an astounding presentation. The audience is engaged. You’ve had a great discussion. Everybody supports the conclusion. And… you walk away. Too often, engineers forget to add: “And here’s what we need to do.” If you don’t ask for something to be done, nothing will be done.

Mistake 3: Not presenting alternatives

People and executives (some of whom display human characteristics) want to feel like they have some control over things. That means they want to be able to make choices. They also want to believe that you, the presenter, have considered many alternatives before drawing your conclusion. To satisfy both needs, you must present two or three (more than that and it’s overwhelming) legitimate approaches that address the challenge. If you don’t they’ll feel like you’re trapping them.

One of my worst presentations was titled – “Large file system restores are slow.” I spent an hour walking through 23 slides detailing the pain of restoring large file systems (both by capacity and file count). At the end, the Sr. Director said, “We knew it was slow. That’s why we hired you. Are you saying that we can’t hire someone to solve this, or that we just made the wrong hire?” Now THAT is an example of quickly presenting actionable alternatives.

Who are You Selling To?

As you sell your idea, you need tailor the pitch to your audience.

  • What actions can you ask for? If your audience doesn’t control resources or roadmaps, then ask them for what they can give – support, personal time, etc. Conversely, if your audience can make decisions, ask for the decision. It’s better to get a “no” than to drift forever.
  • What does your audience care about? Business leaders want to hear about revenue, routes to market, investment costs, etc. Your demo may be the coolest thing ever, but it won’t move them until you get them interested. Technical leaders generally care about both, but be careful about losing them on a deep dive. Technical experts want the deep dive. Engineers want to know what work they need to do.
  • What is their background? If you’re selling an idea to non-experts, you’ll need to spend more time setting context (business, technical , etc.). If you’re talking to experts, don’t waste their time with the basics.

In other words, there is no “one size fits all” presentation. It may be more work to tailor your approach to each audience, but nobody said this was easy.

When I first started working with customers, I would race through my presentation – always doing it the same way. I was too nervous to ask what the audience was interested in hearing. As I talked, I’d never give the audience a chance to respond. I considered myself lucky if the audience sat in silence, so that I could quickly exit, drenched in sweat. One day, I walked into the Briefing Center, saw 2 people in suits sitting there, and rattled through my 30 minute talk. At the conclusion, one of them said, “That was good. That was a lot of the content we want to cover. Just so you know, the customer is running late, but they should be here soon.”


How do you get things done? You convince people. You need to convince business leaders, peers across groups, technical experts, and the engineers who will actually do the work. Whether you’re a new college graduate or a technical leader with decades of experience, the formula doesn’t change:

  • What’s the value?
  • What are the alternatives?
  • Who is the audience?

If you follow these guidelines, you may not always get the decision you like… but you will get a decision. And “getting decisions about actions” is the only way you can get anything done.

-Stephen Manley @makitadremel

The Origins of eCDM: New Technologies & Methodologies

The Origins of eCDM: New Technologies & Methodologies

The Origins of eCDM: New Technologies & Methodologies

EMC recently announced Enterprise Copy Data Management (eCDM), a product that enables global management and monitoring of copy data across primary and protection storage. Perhaps just as interesting as the product itself is the way that the product was conceived, designed, developed, and taken to market. Like the trailblazers of the west, the product team behind eCDM was faced with the daunting challenge of exploring uncharted territory. They created a product from scratch using agile methodologies, open source technology, a brand new UI, and an entirely custom go-to-market strategy.

This is the second post in a series that details the challenges and successes of the product team from conception to release. The first post can be found here.

Agile methodologies, open source software, and an intuitive user interface are expected of modern software today. However, there is no simple, well-worn path from long-standing traditional development processes to these signature software traits. Each deviation from an existing methodology requires a clear business justification, and in turn, a clear benefit for customers. The eCDM team embraced these concepts and worked diligently to excel at agile, test and utilize open source components, and build an attractive, effective user interface – all while emphasizing the customer experience.

When people think of agile today, it’s so easy to accept it as a foundational piece of modern software development. Marina Kazatcker, the lead engineering program manager for eCDM, explained to me that the eCDM team had planned to use agile since the conception of the product. However, it’s important to note why the decision was so clear; other teams within EMC were already using agile methodologies and they were seeing great results from the process. Teams were able to effectively measure their progress and keep track of their assigned stories. Most importantly, as Marina and the team noted, was that agile would allow eCDM to iterate quickly with higher quality, to the benefit of end users. Agile development processes means quicker bug fixes, more releases, and more robust code. With these benefits in mind, Marina and the team welcomed agile and began the journey with 15 scrum teams (editor’s note: With 15 scrum teams, eCDM jumped into the agile deep waters right away!), each addressing a separate feature or component of the product.

Around the same time as those agile conversations, engineering teams were debating the use of various open source tools within the eCDM product. Amrit Jain, a software engineer for eCDM, explained the importance of leveraging open source in modern software products. Before open source was widely adopted by enterprise, engineering teams shared an emphasis on developing everything in-house. However, Amrit highlighted the reason for choosing an open source option over an internal solution: “we’re not in the business of building smaller components; we’re in the business of building our product.” Re-using components that already exist in the open source community allows the team to focus on what really matters: the product. Open source solutions enable products to be built quicker, and with the open source community surrounding these solutions they are robust and battle-tested.

While Amrit was focusing on back-end open source components, the user interface team for eCDM was carefully planning their choice of both technology and design. Skip Hanson, a senior manager responsible for the eCDM user interface, understands the importance of an intuitive user interface to complement underlying technologies in a modern software product. The team focused on ease of use to provide the best possible experience for the end-user. “Enterprise software usability shouldn’t be any different than any other software,” Skip explained to me. It makes sense – the same people that use consumer software will be using enterprise software. Even though enterprise software is more complex by nature, the design shouldn’t reflect that complexity. For that reason, Skip and his team chose modern web tools and design standards to develop the eCDM user interface with a focus on the customer. As our customers have repeatedly cheered on EMC’s new UIs – “No more Java!”

It’s pretty clear to see that each of the technical decisions made by the eCDM team reflect a nearly obsessive focus on the customer’s experience with the product, despite the challenges associated with deviating from existing development processes. Whether it is faster development cycles, more robust product components, or a great user experience, the decisions to use modern methodologies and technologies enable eCDM to more effectively respond to the needs of the customer.

Marina Kazatcker is the lead engineering program manager for eCDM. She began her career as a software engineer developing algorithms for medical devices. For last 15 years, she has worked in various software development and leadership roles inside tech industry. In the last 5 years she has led the Agile transformation in product development in different programs.

Amrit Jain has worked at EMC for 3+ years as a consultant software engineer, with 15+ years of hands-on experience in architecture, technical design, and development of various large-scale cloud and web-based services and applications. Amrit has worked in various software development and leadership roles at Cisco, Oracle, and Ocwen.

Skip Hanson is a 20+ year veteran of EMC and unapologetic UX evangelist. He has spent his whole career in UI/UX design, development and management. He is currently responsible for the UX of eCDM. In past lives he has worked to improve user experience in many projects including: BRM, BRM mobile, NetWorker, Avamar, and Raptor. He is currently Senior Manager for the best group of UX hackers and creatives he’s ever met.

Tyler Stone @tyler_stone_

The Evolution of Cybercrime: Ransomware and the need for an Isolated Recovery Solution

The Evolution of Cybercrime: Ransomware and the need for an Isolated Recovery Solution

Hope is not a strategy.


It takes some fortitude to begin a blog with such a well-worn cliché, but in this case it is more than fitting. With the emergence of ransomware and hacktivism as a rapidly growing new category of threats, all too often the response is hope.


Hope the hackers don’t attack us. Hope we can detect and shutdown and attack before it does too much damage.  Hope that by upgrading our perimeter defense and educating our employees that we will be a harder target.


Good luck with that.


The 2015 Data Breach Investigation report revealed that over 60% of companies could be compromised in six minutes or less. EMC has conducted two global data protection surveys in the past two years, and the resulting data is equally worrisome.  Approximately 1/3rd of all customers have experienced data loss due to a security breach, and the average cost of a single incident is $914,000, which in Dr. Evil terms is really, really close to “One Million Dollars!”


The statistics are out there for you to find if you want a greater dose of shock and horror. (Sometimes I wonder given all the travel I do why I watch the television show “Why Airplanes Crash”, but we are all drawn to the macabre and disturbing, right?)


But the reason I’m writing this isn’t just to put forth the disturbing data again. Instead, it’s to point out that many groups are looking in the wrong place for solutions.


While the threats are evolving, the Information Security community, vendors and IT professionals are all doubling down on more of the same. All of the emphasis is on incident prevention and very little attention is given to data protection and recovery.  The facts however lean very heavily toward the probability that any given business will be the subject of a successful cyber-attack in the foreseeable future.  In fact, most security experts agree that there are two kinds of companies:  those who have experienced a successful cyber-attack, and those who have experienced a successful cyber-attack and simply don’t know it yet.


As these threats have emerged, guidance in the form of Cyber Security Frameworks (CSFs) have been established. Most of these CSFs reference or borrow heavily from the NIST CSF, which has some very important fundamental information.  “Protect” and “Recover” are both pillars of the NIST CSF, as well as every other highly regarded framework.  In this context, “Protect” explicitly means to make secure, isolated protection copies of data.   “Recover” means that after detecting a threat a business needs a plan to suspend production, eliminate the threat, and recover data and systems necessary to resume operations.


EMC has a solution for customers who want to protect themselves from these modern and sophisticated threats called the “Isolated Recovery Solution”. Logically, this provides an “Air gap” between the production environment and the data that is critical for the survival of any business.  Here is a diagram demonstrating the key elements of the EMC Isolated Recovery Solution:


ransomware rc.jpg


Isolated Recovery is NOT Disaster Recovery. IR represents only the most critical data that a business needs to survive.  Most customers do not stand up an IR solution at the same size and scale as their primary backup or DR infrastructure.  Furthermore, IR usually lives in the same location as the production data.  This is essential for rapid recovery from an incident.  The mechanisms IR puts in place are both physical and logical.  For many enterprises this means creating a small locked cage within the data center and restricting access to the very few people who are responsible for the system.


There is a lot more to talk about when it comes to the IR solution, so I encourage you to reach out to one or more trusted advisors, prioritize what matters to you, and then dive into all the details.


But what about my earlier statements about not hanging all our hopes on prevention? Prevention is useful, but a successful defense requires a layered approach.  All of the parts of the solution need to fulfill a purpose within the CSF.  But how are you organized as a company?  When was the last time the Information Security team and the Backup team got together and built a joint solution?  My guess is probably never.  Traditional organizational structures simply aren’t aligned very well in order to solve this problem.  In order to defend against a threat that evolves rapidly, everyone needs to adapt.  Whether it is the backup guru talking to the firewall guru, or the CIO talking to the Board of Directors, someone needs to start having this conversation.  You are that someone.  You can’t just hope that somebody else will do it. Hope is not a strategy.


Rich Colbert

Cloud Native for The Enterprise

Cloud Native for The Enterprise

In Part I of this series, we explored how the Heroku architecture wires middleware software to their platform, allowing developers to focus on writing applications to drive their business. In this part, focusing on the enterprise use case, we’ll analyze the Heroku inspired PaaS system – Cloud Foundry.


Part II – Cloud Foundry – Cloud Native for the Masses


Cloud Foundry – The PaaS for the Enterprise


Cloud Foundry (CF) is considered by many to be the PaaS for the enterprise – and for good reasons. With mature APIs, built-in multi tenancy, authentication and monitoring it’s no wonder vendors like IBM, HP and GE built their platforms as certified CF distributions. Cloud Foundry can be thought of as a customizable implementation of the Heroku architecture. The main difference between Heroku and CF is the flexibility that allows CF to be installed anywhere. Like Heroku, CF adopted the strict distinction between the stateless 12-factor part of the App (“the application space”) and the stateful part of the App (“Services”). While the application space is very similar to the equivalent on Heroku, CF can’t depend on Heroku engineers to manage the services (e.g. databases/messaging). Instead, CF delegates this role to DevOps. As a result, enterprises can configure the platform to expose custom services that meet their unique needs. This sort of adaptability plays in favor of CF in the enterprise world.


Some of the major Cloud Foundry qualities that make it attractive in this space:


Avoid lock-in – With years of experience of being locked in by software and infrastructure stacks, enterprises seek freedom of choice. With the CPI abstraction (Cloud Provider Interface), Cloud Foundry (using BOSH) can be deployed on practically any infrastructure.


Mature on-premises use cases – The cloud is great! Enterprises are not passing up on that trend. However, reality has its own role in decision making, and many workloads are staying on premises. While security, regulations and IT culture are often cites, what keeps a large portion of the workloads on premises is the years and years of legacy systems holding the most important asset of any organization – its data. In order to move mission critical workloads to new architectures on a remote datacenter (e.g. cloud), organizations have to port all the proprietary non-portable data too. Translating for readers with cloud native mindsets: DB2 on mainframe is not yet “Dockerized”, one can’t simply push it around in a blink of an eye. J


Good Transition Story – CF can do more than run legacy workflows on premises. The big difference is that it provides a well-defined architectural transition story. The ability to move parts of the app or new modules to run as CF apps, while easily connecting to the legacy data systems as services (via service brokers), is powerful. This allows developers to experiment with the modern technology while accessing the core systems, giving them a real opportunity to experience the cloud native productivity boosts while keeping their feet firmly on the ground.


Compliance, Compliance and again Compliance – Many cloud native discussions leave out where data is being stored. We often hear that cloud native apps only use NoSQL or Big Data databases. While using modern databases makes sense in some use cases, in order to deliver compliant applications, organizations find it easy and safe to use mature database back-ends (e.g. Oracle/MS SQL) for serving their modern cloud native apps. With Cloud Foundry’s Service Brokers model, they are able to leverage the tools and processes they are already proficient with to protect their data, while modernizing their apps and workflows.


Vendor behind the platform – although Cloud Foundry is open source, Enterprises like having a throat to choke . Proprietary distributions like Pivotal Cloud Foundry support the transformation and can be engaged when customers encounter challenges.


While the motivation to modernize exists in almost every organization these days, not seeing a clear path for transformation can hold companies back. Many of the modern platforms (e.g. Kubernetes or Docker Datacenter) have an appealing vision for where the world of software needs to be, but it is not clear how to make a gradual transformation. By adopting CF, enterprises see an steady path that they can pursue and start getting results relatively quickly.


Cloud Foundry – Trouble in Paradise


Cloud Foundry is an exciting platform that can bring many benefits to organizations adopting it. However, there is always room to improve. CF’s flexibility in handling of stateful services via the “Service Brokers” abstraction and DevOps management is what made the “transition story” possible. However, as always, when making something more generic, there are tradeoffs. Since Heroku manages both stateful and stateless services, they can gain full control over the platform. Without that control, some pain points emerge.

cloud native 2 pic 1

Cloud Native New Silos


On one hand, the application space is modern and fully automated, while on the other hand the services space and its integration points with the app space have a quite a manual feel. It’s no surprise that the the platform shortcomings revolve around lack of control and lack of visibility caused by the new “DevOps Silo” the CF architecture enforces. For example:


Scale out – Cloud Foundry can do an impressive job scaling out the stateless services according to the load on the application. But sometimes, when the compute hassle has been resolved, the next bottleneck becomes the database. If the platform could control stateful services as well, it could scale the database resources as well. Today developers have to pick up the phone and call dev-ops, asking for specific services tuning.


Multi-Site HA – Production systems often run on more than a single site due to HA requirements and/or regulations. In order to support such deployment topologies, someone has to make the stateful services available on multiple sites when required. As the CF runtime ignores stateful services, there has to be an out-of-platform process of orchestrating stateful services. From the CF perspective, on every site you run a different application, and the coupling of those sites is not visible to CF. Seeing such deployment topologies in real life, I can testify that it is painful and error prone.


Portability – Great! CF is portable! Is it? Let’s say I run Facebook on my datacenter using CF. It would be effortless to point my CF client and AWS and push Facebook to a CF instance running on AWS. Job done. Or is it? If you are following so far you might have noticed that in my new Facebook production site on AWS, I’m going to be very lonely. In fact – I’m not going to have any friends! While the stateless services are portable, the stateful ones, that more than everything are the core value of any application, are certainly not portable.


Test & Dev – When new architectures emerge, they often focus on solving the primary use case of running in production. Only later, as the technology matures, do the toolsets and patterns for application maintenance come. To be fair, Cloud Foundry does have nice patterns for the application lifecycle, like blue-green deployments. However this is not enough especially in the enterprise. When maintaining an application, developers often need access to instances with fresh data from production. In the past, they had to speak with a DBA to extract and scrub the data for them, or even had a self-service way for doing this. In the modern world of micro-services, applications tend to use multiple smaller databases rather than one monolith. In addition, the new dev-ops silo means that there is one additional hop in the chain for doing this already complex operation. I’ll elaborate on the developer use case in the next post.


Data protection – While I have heard more than once that data protection is not required in cloud native, where the data services are replicating data across nodes, I’ve also witnessed organizations lose data due to human error (expiring full S3 buckets without backup) or malicious attacks (e.g. dropping entire schema). When crisis happens, organizations need the ability to respond quickly and restore their data. The CF platform hides the data services from us and creates yet another silo in the already disconnected world of data protection. Storage, backup admins, DBAs and now – DevOps.


Cost Visibility – When adopting modern platforms, more and more processes become automatic. This empowers developers to do more. However, with great power (should) come great responsibility. Companies complain that in large systems they lose visibility into the resources cost per application and use case. While with CF you can limit an application to consume some amount of compute resources, you get no control or visibility on stateful services cost (which in many cases is the most significant). For example, developers can run test-dev workloads attaching them to expensive stateful services while they could have used cheaper infrastructures for their test workloads. With Big Data Analytics services there is also lack of visibility in production. There is no ability to distinguish which app is utilizing what percent of the shared resource, and therefore the ability to prioritize and optimize.

Cloud native 2 pic 2

Native Application Data Distribution example


As the technology matures, and more organizations will take it to production and the community will start catching up with solutions. Some initiatives are already being thrown into the air (e.g. BOSH releases for stateful services) and offer some enhanced features (although always with trade-offs). I’ve recently seen a demo by Azure architects that runs Pivotal Cloud Foundry in Azure. Since they have full control of the stateful services, they could preconfigure Cloud Foundry marketplace to work seamlessly. Even more impressive, they stated they are working on supporting this configuration on premises using Azure virtualization software on customer hardware. With the tradeoff of being locked with Microsoft, having the ability to control and get visibility to the full spectrum of the application is certainly an attractive offering.



Cloud Foundry is bringing the Cloud Native environment to the enterprise ecosystem. By creating a more flexible model that depends on Dev Ops, it solves some of the challenges with bringing the Heroku model to the enterprise. Of course, there are new complexities that arise with the Cloud Foundry model. In particular, adding Dev Ops makes the stateful (i.e. persistent data) operations more challenging – especially for developers.


In the next article we’ll discuss the role of developers in the Cloud Native Enterprise. While cloud native applications empower developers to do more, in the enterprise world there are implications that create an interesting dissonance.


Amit Lieberman @shpandrak, Assaf Natanzon @ANatanzon, Udi Shemer @UdiShemer

Deduplicated Storage, Years after the Salesperson Leaves

Deduplicated Storage, Years after the Salesperson Leaves

Deduplicated Storage, Years after the Salesperson Leaves

So you are thinking about purchasing a deduplication storage system for your backups. Everyone tells you that, by removing redundancies, it will save storage space, and reduce costs, power, rack space, and network bandwidth requirements. Perhaps a vendor even did a quick analysis of your current data to estimate those savings. But, many months from now, what will your experience really be? We decided to investigate that question.

I was fortunate enough to collaborate with researchers studying long-term deduplication patterns, and our work was recently published at the Conference on Massive Storage Systems and Technology (MSST 2016)[1].

The Study

How do you perform a long-term study of deduplication patterns?

  • Step one: Create tools to gather data.
  • Step two: Gather data for years. And years. And years.
  • Step three: Analyze, analyze, analyze. Then upgrade your tools to analyze some more.

Our data collection tools were designed to process dozens of user home directories in a similar fashion as an actual deduplicated storage system. The tools ran daily from 2011 through today, with infrequent gaps due to weather-related power outages, hardware failures, etc. Every day, the tools would read all the files, break each file into chunks of various sizes (2KB – 128KB and full file), calculate a hash for each chunk, and record the file’s metadata.

In this paper, we analyzed a 21 month subset of the data covering 33 users, 4,100 daily snapshots, and a total of 456TB of data. If this isn’t the largest and longest running deduplication-oriented data set ever gathered, it is certainly among the top few. Analyzing the collected data was a gargantuan task involving sorting and merging algorithms that ran in phases, writing incremental progress to hard disk since the data was larger than memory.  For anyone that wants to perform their own data collection or wants to perform their own analysis of this immense data set, the tools and data set are publicly available.


The Results

What did we discover? Starting with general deduplication characteristics, we find that whole file deduplication works well for small files (<1 MB) but poorly for large files. Since large files consume most of the bytes in the storage system, this result supports the industry trend towards deduplicating chunks of files. We then looked at optimal deduplication chunk sizes based on the type of backup schedules.

  • Weekly full and daily incremental backups: optimal chunk size is 4-8KB
  • Daily full backups: optimal chunk size is 32KB

Deduplication ratios also varied by file type. Unsurprisingly, compressed files such as .gz, .zip, and .mp3 get little deduplication no matter the chunk size, while source code had the highest space savings because only small portions are modified between snapshots.  Interestingly, VMDK files took up the bulk of the space, highlighting that storage systems need to be VM-aware.

Looking at the snapshots for our 33 users, we found significant differences that are hidden by grouping all the users together. Per-user deduplication ratios varied widely from a low of 40X (meaning their data fits in 1/40th of the original space) to a high of 2,400X!  That outlier user had large and redundant data files that could be compacted dramatically through deduplication.

While much of the redundancy is related to the large number of user snapshots, increasing the number of preserved snapshots (e.g. the retention window) does not linearly increase the deduplication ratio because of metadata. For example, we saw the deduplication was as high as 5,000X for the outlier user before considering the impact of file recipes. A file recipe is internal metadata needed to represent and reconstruct the files in their deduplicated format. The recipe consumes a tiny percentage of the space compared to the actual file data. However, even when data isn’t changing, we still need a recipe for each file. This led to an interesting conclusion. We found that deduplication ratios tend to grow rapidly for the first few dozen snapshots and then grow more slowly because file recipes become a larger fraction of what is stored. There are even cases where deduplication ratios dropped because a user added novel content.

Finally we looked at grouping users together to see if there are advantages to deduplicating similar users together. We found that there are pairs of users that overlapped as much as 44%, which can directly turn into space savings by placing those users on the same deduplication system. Intelligent grouping and data placement can improve customers’ deduplication results.

These findings can provide guidance to storage designers as well as customers wondering how their system will respond to long-term deduplication patterns. Customers and designers should think about matching chunk sizes to backup policies, metadata management at scale, and grouping together similar users’ data. There are many more details in the paper, and researchers are welcome to use our tools and data set to perform their own analysis.

[1]Zhen Sun, Geoff Kuenning, Sonam Mandal, Philip Shilane, Vasily Tarasov, Nong Xiao, and Erez Zadok. “A Long-Term User-Centric Analysis of Deduplication Patterns.”  In the Proceedings of the 32nd International Conference on Massive Storage Systems and Technology (MSST 2016).


-Philip Shilane @philipshilane

The Robot Rock Cortex

The Robot Rock Cortex

Hurtling through the IT multiverse on leading edge of a ray of light this week Inside the Data Cortex:

  • Mark wants to replace everyone with a robot. Including The Rock.
  • Stephen rejects this and believes The Rock is the biggest movie star in the world. Both worry about Nick Nolte…and Ricky Martin?!?
  • Mark’s hero? Attila The Hun. Then he kills the vibe by deciding this week we’re talking about APIs. It is so disappointing.
  • Ransomware, Isolated Recovery Services, APIs for Services Providers, level based targeting, the stuff which will never be standardised, test automation and doubling your salary to slum it.
  • This time in books, Stephen wades further into the creation of the United States in “Revolutionary Summer” while in “The Great Crash, 1929” Mark discovers we are not stupid, just human. A Neal Stephenson recommendation, Asimov did it before you and in “The Price of Prosperity” the Emperor Augustus puts a tax on the childless.

Download this episode (right click and save)

Subscribe to this on iTunes

Follow us on Pocket Casts
Stephen Manley @makitadremel Mark Twomey @Storagezilla