In late 2015 I was researching the options available to protect application workloads running in public cloud environments. In this post I will discuss my findings, and what we are doing at Dell EMC to bring Enterprise grade Data Protection Solutions to workloads running in public cloud environments.
To understand how Data Protection applies to public cloud environments, we need to recognize that Data Protection can occur at different layers in the infrastructure. These include the server, storage, hypervisor (if virtualized), application and platform layer. When we implement Data Protection for on premises environments, our ability to exercise Data Protection functions at any one of these layers depends upon the technologies in use.
At the server layer, we typically deploy an agent-based solution that manages the creation of Data Protection copies of the running environment. This method can be used for virtualized, bare metal and even containerized environments that persist data.
At the application layer we typically rely on the applications’ native data protection functions to generate copies (usually to file system or pseudo media targets). Examples of this can include database dumps to local or remote disk storage. We can go a step further and embed control-and-data path plugins into the application layer to enable the application’s native data protection methods to interface with Data Protection storage for efficient data transfer and Data Protection management software for policy, scheduling, reporting and audit purposes.
Like the server approach, the application native approach is agnostic to the platform the application is running on, be it virtualized, bare metal or containerized, in public or private cloud environments. Where things get interesting is when we start leveraging infrastructure layers to support Data Protection requirements. The most common infrastructure layers used are the hypervisor and storage-centric Data Protection methods.
A by-product of infrastructure methods is they require privileged access to the infrastructure’s interfaces to create protection copies. In private cloud environments this requires coordination and trust between the Backup or Protection Administrator and the Storage or Virtualization Administrator. This access is often negotiated when the service is first established. In Public Cloud environments there is no Storage or Virtualization Administrator we can talk with to negotiate access. These layers are off limits to consumers of the Public Cloud. If we want to exercise Data Protection at these layers, we have to rely on the services that the Public Cloud provider makes available. These services are often referred to as Cloud-based Data Protection.
For example, Amazon Web Services (AWS) offers snapshots of Elastic Block Storage (EBS) volumes to S3 storage. This provides protection of volumes at the block-level. Microsoft Azure offers snapshots of VM’s to Azure Blob Storage and the Azure backup service for VM instances running the Windows Operating Systems.
A common property of Cloud-based Data Protection services and infrastructure-centric protection methods for that matter, is they are tightly coupled. Tight coupling means the technologies and methods are highly dependent on one another to function, which allows the method to perform at peak efficiency. For example, the method is able to track the underlying data that is changing in the virtual machine instance, and when appropriate take copies of the data that has changed between copies.
Tightly coupled methods have gained popularity in recent years simply because data volumes continue to grow to the extent that traditional methods are struggling to keep up. However, there are some important trade-offs being made when we bet the business solely on tightly coupled Data Protection methods.
Tight coupling trades efficiency for flexibility. In other words, we can have a very efficient capability, but it is highly inflexible. In the case of Data Protection, a solution focused on flexibility allows one to free the data copies from the underlying infrastructure. For example, in the case of AWS snapshot copies to S3, the copies are forever tied to the public cloud platform. This is a critical point that requires careful attention when devising a Public Cloud Data Protection strategy.
The best way I can describe the implications is to compare the situation to traditional on premises Data Protection methods. With on premises solutions, you are in full control of the creation, storage and recovery processes. For example, let us assume you have implemented a protection solution using a vendor’s product. This product would normally implement and manage the process of creating copies and storing these copies on media in the vendor’s data format (which in modern times is native to the application being protected). The property we usually take for granted here is we can move these copies from one media format to another or one location to another. We can also recover them to different systems and platforms. This heterogeneity offers flexibility, which enables choice. The choice to change our mind or adjust our approach to managing copies subject to changing conditions. For example, with loosely coupled copies, we can migrate them from one public cloud providers’ object storage (e.g. AWS S3) to another public cloud providers’ object storage (Azure Blob Storage), or even back to private cloud object storage (Elastic Cloud Storage), if we decide to bring certain workloads on premises.
Despite these trade-offs, there are very good reasons to use a public cloud providers native Data Protection functions. For example, if we want fast full VM recovery back to the source, we would be hard pressed to find a faster solution. However, cloud-native solutions do not address all recovery scenarios and lack flexibility. To mitigate these risks, a dual approach is often pursued that address the efficiency, speed and flexibility required by Enterprise applications, in public, private or hybrid cloud models.
My general advice to customers is to leverage tightly coupled Data Protection methods for short-lived Data Protection requirements, along with loosely coupled methods. In the case of Public Cloud models, this requires the deployment of software technologies (or hardware, via services like Direct Connect and ExpressRoute) that are not tied to the Public Cloud provider’s platform or data formats. As a consumer of Public Cloud services, this will afford you the flexibility to free your data copies if need be, in future.
At Dell EMC we recognize that customers will deploy workloads across a variety of cloud delivery models. These workloads will require multiple forms of Data Protection, based on the value of the data and the desire to maintain independence from the underlying infrastructure or platform hosting the workloads.
Our strategy is to provide customers Data Protection everywhere. This protection will be delivered via multiple avenues, including orchestrating the control path of Public Cloud provider’s native solutions, and allowing the Public Cloud to host and manage the data path and storage. For workloads that require ultimate flexibility and independent Data Protection copies, we will also manage the data path and storage, to enable copies to remain agnostic to the cloud vendor. Furthermore, for customers that choose to consume SaaS-based solutions, we will continue to work with SaaS vendors to expand our existing SaaS Data Protection offering to export and manage data copies using the vendor’s available API’s, to the extent possible.
Ultimately, customers will choose which path they take. Our strategy is to ensure our Data Protection solutions allow customers to take any path available to them.
~Peter Marelas @pmarelas