Open-source software (OSS) is dramatically accelerating the evolution of open market platforms and open storage architectures. Startups and established companies are leveraging OSS to quickly build out storage infrastructure, which allows them to focus their limited resources on developing the core intellectual property that differentiates them. At EMC, we recognize the value of OSS. We do not just to leverage open source, but contribute and partner to advance open source in support of new technologies. Recently, we transformed our mid-range platform’s proprietary interconnect software to a scalable open source model – contributing core transport drivers to the OSS community.
Why Open Source Software (OSS)?
Today you can find low cost server and storage and integrated infrastructure that have the same enterprise features found in datacenter infrastructure from companies like EMC. Adoption of open market platforms has really been made possible through standardization of infrastructure and the maturity of OSS to support them. The historical concerns about lack of enterprise coverage or time-to-market are disappearing because there is such a strong community supporting OSS. The days of custom microcode, kernels, and devices drivers are gone and it changes how datacenter infrastructure vendors operate. There is no need to spend countless dollars maintaining and porting custom code between product releases. It reduces the risk of depending on a small, specialized set of engineers who may leave at any time. OSS is changing how we build infrastructure.
At EMC, we recognize the value of OSS as it relates to our infrastructure development. This goes beyond simply leveraging open source, but also contributing to open source, and working with our industry partners to advance open source in support of new technologies and open market platforms. Some of the advantages we see are;
- Leverage a world-wide community of open source developers
- Reduced engineering cost – cost is spread across the OSS community.
- Focus more of our engineering dollars on intellectual property investments.
- Improved quality and availability – faster discovery and resolution.
- Faster time to market – community developed and available across many distributions.
- Increase mindshare – exchange, propose, and support new features and capabilities.
- Less risk against attrition – gaps in training are more tolerable because the community continues to advance the investment.
- Contribute OSS solutions
- Influence adoption of new technologies and enterprise capabilities found in open market platforms.
- Enable open market platforms as an alternate source for product offerings.
- Generate industry credibility.
- Motivate engineers and attract new talent
- Create projects to transform our software infrastructure to leverage, and advance OSS within our products.
- Network with the community by participating in public venues such as mailing lists, meet-ups, industry conferences, blogs, and speaking engagements.
There are many great examples across EMC where we are engaged with the OSS community and contributing to OSS. I want to discuss how we used OSS to transform a part of our software architecture while contributing enterprise level enhancements to a section of the Linux Kernel.
This example is about to enabling support of open market platforms with internal PCI Express NTB interconnects.
EMC and OSS – A success story
Storage systems need to reliably store incoming data, even in the face of system or component failures. Connecting servers in a distributed system is an essential part of building highly available and reliable products. This is especially useful for committing newly written data to multiple systems to ensure availability of data services. Enterprise storage products commonly network a pair of servers using an internal PCI express bus. As data flows into a server, it is persistently stored locally and mirrored to the peer server before the write is acknowledged to the host. The PCIe connection between the two servers is formed by configuring a special device called a Non-Transparent Bridge (NTB) in the root port complex of each Intel® Processor.
Each NTB connects the memory between two systems. The NTB is programmed with an address window within the local Processor address space that represents the peer address space as well as a set of registers to translate local accesses into peer memory accesses. They also contain a set of doorbell registers that are programmed and used to signal the remote Processor. Once this configuration is set on both servers, the Direct Memory Access (DMA) controller internal to each Processor can be programmed to move data from the local system to the remote system through the NTB link. The last entry of the DMA transfer contains a write to the doorbell register, signaling an interrupt to the peer, so it knows to look for the data.
A visual example of this configuration can be found in figure 1. This architecture is employed in EMC Midrange products and is available in open market platforms today. Two such examples are from Quanta and AIC:
- Quanta – http://www.qct.io/Product/Server/Rackmount-Server/Multi-node-Server/QuantaPlex-T21SR-2U-2-Node-p284c77c70c83c185
- AIC – http://www.aicipc.com/ProductDetail.aspx?ref=HA201-TP
Figure 1 – Dual Server with redundant internal PCI Express NTB Interconnects
EMC has been leveraging this architecture for years. We have maintained a custom NTB driver, a custom DMA controller driver, and a custom protocol stack to communicate across the link. In the last decade, Intel® has released Linux Kernel support for the DMA controller, and more recently NTB support. However in this timeframe EMC Midrange products were based on the Windows operating system. It was not until 2013-14, with the introduction of the VNXe product line, that we transformed our product architecture to leverage the Linux operating system. Realizing the technical overhead of porting the entire stack including hardware drivers, we started to eliminate our custom code and introduce new code that leveraged OSS. One of our successes in that effort has been to provide server redundancy using both NTB PCI Express and Remote Direct Memory Access (RDMA). We have then contributed the changes to OSS to support other proprietary and open market platforms.
Figure 2 – NTB and RDMA server redundancy
To support this strategy, we transformed our proprietary communication stack to leverage an RDMA interface. RDMA describes the general concept where one system can efficiently read and write the memory of its peer, but it doesn’t specify an implementation or programming interface to do so. NTB transfers are just one implementation of this concept; others include Infiniband, RoCE, and iWarp. The Linux Infiniband Verbs (IB Verbs), and the Open Fabrics Enterprise Distribution (OFED), provide hardware drivers and a programming abstraction, so that the same RDMA-aware applications can run on top of different RDMA implementations.
We transformed our legacy peer-to-peer Messaging App into a new flexible application that leverages OSS. The Messaging App (seen below) is one component in our storage driver and data protection stack. We ported it to an RDMA-aware application using the common Linux IB Verbs interface. We then created an NTB transport driver, called NTRDMA, which leverages the open source DMA driver. This eliminated our proprietary code stack and allows us to swap out the transport without any application changes. For platforms that don’t have NTB we can simply use an Infiniband, RoCE, or iWarp adapter with open source drivers.
Figure 3 – Transformation of EMC proprietary stack
EMC and OSS – Contributions
EMC has since released the NTRDMA transport driver, along with many NTB infrastructure and performance enhancements in the Linux Kernel. Our contributions have leveraged our enterprise expertise in building high performing multi-core infrastructure. This is a great example of how EMC is contributing its expertise to OSS.
The submission of this work to Linux Kernel v4.2-rc1 was a collaboration of Allen Hubbe (EMC) and the NTB subsystem maintainers John Mason (Broadcom), and Dave Jiang (Intel).
NTB pull request (Linus Torvalds)
“Linux 4.2 Offers Performance Improvements for Non-Transparent Bridging”
Want to find out more?
Allen Hubbe (EMC) and David Jiang (Intel) are co-presenting on this topic at Linux Vault 2016 in Raleigh, NC on April 20th 2016. http://events.linuxfoundation.org/events/vault
–Daniel Cummins @cummins603