Data Protection for Public Cloud Environments

Data Protection for Public Cloud Environments

In late 2015 I was researching the options available to protect application workloads running in public cloud environments. In this post I will discuss my findings, and what we are doing at Dell EMC to bring Enterprise grade Data Protection Solutions to workloads running in public cloud environments.


To understand how Data Protection applies to public cloud environments, we need to recognize that Data Protection can occur at different layers in the infrastructure. These include the server, storage, hypervisor (if virtualized), application and platform layer. When we implement Data Protection for on premises environments, our ability to exercise Data Protection functions at any one of these layers depends upon the technologies in use.


At the server layer, we typically deploy an agent-based solution that manages the creation of Data Protection copies of the running environment. This method can be used for virtualized, bare metal and even containerized environments that persist data.


At the application layer we typically rely on the applications’ native data protection functions to generate copies (usually to file system or pseudo media targets). Examples of this can include database dumps to local or remote disk storage. We can go a step further and embed control-and-data path plugins into the application layer to enable the application’s native data protection methods to interface with Data Protection storage for efficient data transfer and Data Protection management software for policy, scheduling, reporting and audit purposes.


Like the server approach, the application native approach is agnostic to the platform the application is running on, be it virtualized, bare metal or containerized, in public or private cloud environments. Where things get interesting is when we start leveraging infrastructure layers to support Data Protection requirements. The most common infrastructure layers used are the hypervisor and storage-centric Data Protection methods.


A by-product of infrastructure methods is they require privileged access to the infrastructure’s interfaces to create protection copies. In private cloud environments this requires coordination and trust between the Backup or Protection Administrator and the Storage or Virtualization Administrator. This access is often negotiated when the service is first established. In Public Cloud environments there is no Storage or Virtualization Administrator we can talk with to negotiate access. These layers are off limits to consumers of the Public Cloud. If we want to exercise Data Protection at these layers, we have to rely on the services that the Public Cloud provider makes available. These services are often referred to as Cloud-based Data Protection.


For example, Amazon Web Services (AWS) offers snapshots of Elastic Block Storage (EBS) volumes to S3 storage. This provides protection of volumes at the block-level. Microsoft Azure offers snapshots of VM’s to Azure Blob Storage and the Azure backup service for VM instances running the Windows Operating Systems.


A common property of Cloud-based Data Protection services and infrastructure-centric protection methods for that matter, is they are tightly coupled. Tight coupling means the technologies and methods are highly dependent on one another to function, which allows the method to perform at peak efficiency. For example, the method is able to track the underlying data that is changing in the virtual machine instance, and when appropriate take copies of the data that has changed between copies.


Tightly coupled methods have gained popularity in recent years simply because data volumes continue to grow to the extent that traditional methods are struggling to keep up. However, there are some important trade-offs being made when we bet the business solely on tightly coupled Data Protection methods.


Tight coupling trades efficiency for flexibility. In other words, we can have a very efficient capability, but it is highly inflexible. In the case of Data Protection, a solution focused on flexibility allows one to free the data copies from the underlying infrastructure. For example, in the case of AWS snapshot copies to S3, the copies are forever tied to the public cloud platform. This is a critical point that requires careful attention when devising a Public Cloud Data Protection strategy.


The best way I can describe the implications is to compare the situation to traditional on premises Data Protection methods. With on premises solutions, you are in full control of the creation, storage and recovery processes. For example, let us assume you have implemented a protection solution using a vendor’s product. This product would normally implement and manage the process of creating copies and storing these copies on media in the vendor’s data format (which in modern times is native to the application being protected). The property we usually take for granted here is we can move these copies from one media format to another or one location to another. We can also recover them to different systems and platforms. This heterogeneity offers flexibility, which enables choice. The choice to change our mind or adjust our approach to managing copies subject to changing conditions. For example, with loosely coupled copies, we can migrate them from one public cloud providers’ object storage (e.g. AWS S3) to another public cloud providers’ object storage (Azure Blob Storage), or even back to private cloud object storage (Elastic Cloud Storage), if we decide to bring certain workloads on premises.


Despite these trade-offs, there are very good reasons to use a public cloud providers native Data Protection functions. For example, if we want fast full VM recovery back to the source, we would be hard pressed to find a faster solution. However, cloud-native solutions do not address all recovery scenarios and lack flexibility. To mitigate these risks, a dual approach is often pursued that address the efficiency, speed and flexibility required by Enterprise applications, in public, private or hybrid cloud models.


My general advice to customers is to leverage tightly coupled Data Protection methods for short-lived Data Protection requirements, along with loosely coupled methods. In the case of Public Cloud models, this requires the deployment of software technologies (or hardware, via services like Direct Connect and ExpressRoute) that are not tied to the Public Cloud provider’s platform or data formats. As a consumer of Public Cloud services, this will afford you the flexibility to free your data copies if need be, in future.


Our Strategy


At Dell EMC we recognize that customers will deploy workloads across a variety of cloud delivery models. These workloads will require multiple forms of Data Protection, based on the value of the data and the desire to maintain independence from the underlying infrastructure or platform hosting the workloads.


Our strategy is to provide customers Data Protection everywhere. This protection will be delivered via multiple avenues, including orchestrating the control path of Public Cloud provider’s native solutions, and allowing the Public Cloud to host and manage the data path and storage. For workloads that require ultimate flexibility and independent Data Protection copies, we will also manage the data path and storage, to enable copies to remain agnostic to the cloud vendor. Furthermore, for customers that choose to consume SaaS-based solutions, we will continue to work with SaaS vendors to expand our existing SaaS Data Protection offering to export and manage data copies using the vendor’s available API’s, to the extent possible.


Ultimately, customers will choose which path they take. Our strategy is to ensure our Data Protection solutions allow customers to take any path available to them.


~Peter Marelas @pmarelas

The Hashtag Cortex

The Hashtag Cortex

Escaping the deadly radiation of the tech industry pulsar this time Inside The Data Cortex.

  • This year has been “The Year of all Flash” and Mark didn’t notice.
  • Weeks after day one Stephen and Mark discuss day one. It was kind of like day zero and not much different than day two. But day two had the world’s largest donut at Dell EMC World.
  • Weight gain and not much weight loss at tradeshows.
  • Stephen on the Goldilocks approach to embracing the public cloud and the tyranny of selection bias.
  • Do Google consider themselves an enterprise supplier?
  • This time of year there’s no sunshine anywhere outside of California. Says man living in California.
  • Software Defined Storage is kind of interesting. Says customer who thinks the installation packages will do everything.
  • Scale out is still a hard problem.
  • Mark has looked at home grown storage solutions and sees a lot of ugly babies. (Sorry! He’s not sorry.)
  • The Botnet of Things is real and your dishwasher is hitting someone with a denial of service attack right now.
  • This episode in reading things. Alcatraz Verses the Evil Librarians, Benjamin Franklin: An American Life, Steinbeck’s The Winter of our Discontent, Ken Clarke’s Kind of Blue and Stalin Paradoxes of Power.

No one likes to give up power. Go before you are pushed. Because it will be people like us doing the pushing.

Download this episode (right click and save)

Subscribe to this on iTunes

Get it from Podbean

Follow us on Pocket Casts
Stephen Manley @makitadremel Mark Twomey @Storagezilla

Cloud Native for The Enterprise

Cloud Native for The Enterprise

The “Cloud Native” application architecture is disrupting the way organizations build software because it generates massive productivity boosts. The two major benefits are: 1) getting from idea to production faster and 2) scaling more easily. “Cloud native” helps remedy the biggest challenges of modern software. Cloud native patterns and practices, originally crafted by the likes of Google or Netflix, have found their way into the industry via popular open source projects. Developers from small startups all the way to huge corporations are adopting those technologies and practices and weaving them into the software fabric of their business.


In this series of articles, we’ll explore how cloud native practices are affecting companies, especially enterprises. We’ll look at this transformation through the eyes of a “typical enterprise” focusing on their unique needs. We’ll make some observations on their challenges, how they are addressing them, and what vendor opportunities lie ahead in this market.

“if you want to understand today, you have to search yesterday”

~ Pearl Buck

Part I – The Heroku Use Case

To understand the current state, we’ll first briefly review the history of what we call “Cloud Native for Masses”. Heroku was the first successful initiative popularizing cloud native for developers outside the software giant club. Until Heroku, organizations had to build an entire stack of middleware software to deliver cloud native microservices in production. The software giants have done exactly that – built or patched together a variety of tools to maintain their microservices. Netflix, Twitter, and Facebook all had similar, but proprietary, stacks for doing service discovery, configuration, monitoring, container orchestration etc. Heroku engineers were able to distill a key ingredient of cloud native as we know it today: the “platform” component aka PaaS.


cloud native 1.jpg

Heroku – Cloud Native for the masses

Heroku began by hosting small-to-medium scale web applications by cleanly separating between the platform’s responsibility and the application’s. Coining the term 12-factor apps, Heroku could run, configure and scale various types of web applications. This enabled their customers to focus on their business and applications, delegating maintenance of the platform to Heroku. We won’t go into a detailed description of the Heroku architecture and workflows, but we will describe one key aspect, since it is important for this analysis – how Heroku handles stateful services.


In order to orchestrate the application, Heroku had to create a clear line between the stateful and stateless parts of the application; in Heroku terminology it separated the application and Add-ons. The reason for the split is physics. Scaling, porting, and reconfiguring stateless processes is done seamlessly using modern container orchestration (Dyno – in Heroku). Doing the same operations for stateful processes requires moving physical bits between the processes and maintaining their consistency. To scale a stateless process, you run another instance. To scale a stateful service involves setting network intensive distributed state protocols that are different from system to system. Heroku’s way of solving the stateful needs is by focusing on popular stateful services and offering those as dependencies to the applications. Heroku added Postgres, Redis, RabbitMQ, email service, SMS – all as services in Heroku’s add-ons marketplace maintained by Heroku engineers (DBAs, IT).


cloud native 2

App space v.s. Stateful Marketplace

For “the Heroku use case”, this separation was elegant since it not only solved real world problems, but also was relatively easy to monetize. Developers could use variety of programming languages and stacks for their applications without requiring Heroku to support each stack (*to learn more, point google at “heroku buildpacks”). Focusing on web applications, Heroku’s experience allowed them to offer unique capabilities out of the box like powerful http caches, developer workflow tools etc. Another advantage of this approach is that it creates stickiness with Heroku because the data is managed by the service and porting the data won’t be a straight forward task. Over time, more vendors offered Add-ons on the Heroku marketplace and enabled 3rd parties to leverage the Heroku platform.


Seeing the potential in the Heroku architecture led to various initiatives trying to imitate it in a more generic and portable manner. In the next article we’ll focus on one of those initiatives – Cloud Foundry. We’ll observe the resulting implications and effects when applying the Heroku architecture on use cases other than the “Heroku hosting use case” it was designed for.


To summarize, we’ve seen that in order to deliver Cloud Native Applications, developers need a platform – Either they build it themselves, or they buy/lease one. In most organizations, having engineers with this skillset doesn’t make sense – hence there is room for consumable PaaS platforms. A common metaphor I heard from Pivotal puts this point nicely: “Some people are superman so they can fly around, but most of us need to board an airplane”.


Amit Lieberman @shpandrak, Assaf Natanzon @ANatanzon, Udi Shemer @UdiShemer

Road to Efficiency, Part 1

Road to Efficiency, Part 1

In the new IT, there are so many buzzwords, especially around cloud services. Where does the cloud actually fit?
Clouds can be private or public, and they can serve traditional “Platform 2” applications as well as new “Platform 3” applications. So let’s look at cloud services from that perspective.


Vlad 1

Of course, some things don’t change regardless of the quadrant of the matrix. We always need to:

  • Protect the data wherever it is.
  • Simplify management across environments.
  • Get more value out of the data.

When talking about the cloud, two important aspects are frequently overlooked:

  1. Private clouds should be as easy to manage and as elastic and flexible as public clouds are. Private clouds shouldn’t get graded on a curve because they come from traditional IT teams. In that sense, I appreciate the urgency that the public cloud revolution has placed on traditional infrastructure providers. It’s time to modernize the solution end to end, not just build a bigger system.

  2. If you move your data to the public cloud, you still need to protect it. The responsibility for resiliency and access may move to the cloud solution provider, but if data is deleted (inadvertently or intentionally) or corrupted on a logical level (and we know applications never corrupt data, don’t we?), it doesn’t matter on which infrastructure it runs. Furthermore, most businesses typically require more than just the most recent point in time copy of data. Finally, remember that these requirements apply equally to IaaS, PaaS, and SaaS solutions.


What are we building to help with this transition? In the Data Protection Cloud unit of EMC’s Core Technologies Division, we look at four primary items:

1.  Data Tiering to Cloud:

Any data, regardless of whether it sits on primary storage, protection storage, end points or in-cloud should be able to move to and from any cloud. This is very important because it covers all customer data—past, present, and future!

2.  In-Cloud Data and Application Protection:

EMC already has industry-leading on-premises enterprise data protection solutions with NetWorker and Avamar data protection software paired with Data Domain. But we need to be able to protect data that sits in public clouds as well as opaque data that is present within some software-as-a-service solutions. New products, such as Spanning Backup by EMC, were created for SaaS application data protection.

3.  Converged File and Protection Services:

Everybody in the industry is talking about converged infrastructure and focusing on different models of consuming on-premises technology. In the cloud, we can converge multiple types of data usage into a simplified and unified solution. The cloud can be my authoritative central copy of data while I maintain local caches as I need them for fast access. Suddenly, I don’t have to worry about managing multiple copies; including distribution and replication, I can have all the features I expect from data protection built-in with my primary solution. And of course, that will apply to both public and private clouds. The best part? I don’t have to worry about all of the dedicated physical infrastructure to make that happen. But why stop at converging infrastructure? Converge your production and protection—globally! Can we do this? Stay tuned!

4.  Extend Search, Hold, Discovery Platform:

In the end, we need to enhance the value of the data itself. One way is by providing insight into all data, regardless of whether it resides on-premises or in the cloud, on primary storage or as part of data protection solution. Once we can gather and identify all data, the key is unlocking its value. Global search, hold and discovery are just some of the initial use-cases.

After seeing how far the cloud can take us, now we can map all four of them to the same diagram we’ve used earlier:

Vlad 3


How important is it for IT to adapt to new times and actively seek ways to improve—not just financial efficiency, but in delivering value to the business? Take a look at the following quote. In my mind, nothing can be more true today in the IT world:

“The advantage you have yesterday, will be replaced by the trends of tomorrow. You don’t have to do anything wrong, as long as your competitors catch the wave and do it right, you can lose out and fail.”

-Stephen Elop, Ex Nokia CEO

Vladimir Mandic @vmandic00

Make IT Rain With Your EMC Hybrid Cloud

Make IT Rain With Your EMC Hybrid Cloud

Today’s technology forecast includes an accelerated shift to the hybrid cloud. With over 75% of enterprise IT organizations deploying hybrid cloud already, adoption is predicted to continue to grow. In only the last few years, hybrid cloud has become the preferred way to deliver IT services and drive business transformation. This climate change enables IT to better leverage Public Cloud and Service Providers, delivering new app services with more flexibility and agility. It also helps IT optimize their Private Clouds, delivering traditional app services with more efficiency and trust.  In today’s Hybrid Cloud announcement, EMC adds even more across the horizons of the both of these worlds.

New capabilities provide more confluence between EMC’s storage and protection technologies, creating the perfect storm to deliver more business value.  New automation optimizes Hybrid Cloud efficiency by storing information in the right place at the right costs. New protection capabilities extend data services and provide a path to your trusted Hybrid Cloud. The convergence between storage and protection eliminates silos to boost service levels with the right mix of cost, performance, availability, replication, backup and archive.

Here’s how… and maybe more importantly… WHY it matters.

On the Storage side, enhancements to VMAX3 FAST.X and Cloud Array make placement and movement of information within a Hybrid Cloud automated and transparent. Inactive data can be moved to an external cloud to lower costs while data remains accessible. It can provide resource elasticity to service storage spikes, providing oversubscription headroom. And it allows workloads to move back and forth between VMAX storage and cloud storage with a simple click to support projects and tests.

Under the covers, Cloud Array is included in the same storage resource pool as other VMAX3 storage tiers and FAST.X connected arrays. Using the Service Level Objective management, admins apply an SLO to an app, and the system figures out where to most efficiently place the data. For apps that require moderate performance, data can be moved to an external cloud for low cost. For apps that need a higher SLO, a simple click can change the policy and promote the data to a faster tier, to increase performance. Management of these tiers is abstracted allowing admins achieve the desired SLO without being exposed to the complexity under the covers.  Very cool stuff for the folks who manage multiple storage arrays.

On the Protection side, new capabilities protect your data wherever it resides –Private, Public, or Hybrid Cloud. IT can deliver the right service level by leveraging a continuum of protection services, from availability to backup to archiving. Here are some of the cool highlights.

Cloud Boost 2.0 delivers better performance, management, and flexibility to backup using a choice of Cloud providers, including EMC, Google, Azure, and Amazon. Improved compression doubles efficiency, reducing bandwidth and capacity requirements, lowering costs.  New data cache speeds up backup and restore operations, improving RTO’s and RPO’s. And bandwidth optimizations and tuning deliver greater network throughput to the cloud with consistent, predictable performance. For backup admins, these capabilities address real pain points that have made using cloud providers as a backup target an unrealistic option.

Data Domain 5.7 introduces secure multi-tenancy updates combined with capacity reporting. And for those ever growing backups, a higher density shelf provides 33% more system capacity. Network isolation enables secure tenant admin access by specifying IP addresses to restrict access.  The combination enables self-service reporting and monitoring by tenant admins and users. With DD 5.7, IT organizations can become more like internal service providers by offering better visibility into what users are consuming, and how much it actually costs.

EMC Data Protection Suite for Backup and for Archive delivers the bow that wraps up a complete packaged solution for end-to-end protection and archive. NetWorker 9’s new universal policy engine automates and simplifies protection regardless of where your data resides. Under the covers, a policy engine automates movement of protection data through tiers of storage. Protected data can be local for immediate access while cold data can be handed off to cost-efficient cloud targets.

In addition, new common services provide improved integration. DP Search, unified with Avamar and NetWorker, now supports metadata and text indexing of filesystem and NDMP backups. What makes DP Search slick is how it enables new insight and business value from backup data. Combined with CloudBoost with NetWorker and Avamar as a cloud storage tier, it offers a way to replace tape for long term retention using a flexible and reliable cloud storage option.

Going forward, users can expect EMC storage and protection services to continue to become more integrated with their Hybrid Cloud.  While there’s a ton on new capabilities here today, rest assured, the forecast is for more innovation.

Hybrid Cloud will transform IT to deliver services with more value to the business. Delivering IT services using Private versus Public Cloud capabilities is no longer an “Or”. For IT, the choices are about the “And”.  And many are counting on EMC to deliver on the needs of IT and support that choice – rain and shine.

Scott Delandy @scottdelandy

Podcast: The Consumption Model Cortex

Podcast: The Consumption Model Cortex

Hurtling through the bloodstream in an Explorer class nano-ship, Stephen Manley and Mark Twomey once again step Inside The Data Cortex.

In this episode:

  • The deal which can not be named.
  • It’s 20152005, 1995 all over again with the discussion of consumption models.
  • We’ve decided John Wick & Weekend at Bernie’s have nothing in common.
  • Adventures in disappointing books.

Download this episode (right click and save)
Subscribe to this on iTunes
Follow us on Pocket Casts

Stephen Manley @makitadremel Mark Twomey @Storagezilla