Marvell Blog

Featuring technology ideas and solutions worth sharing

Marvell

February 20th, 2019

NVMe/TCP – Simplicity is the Key to Innovation

By Nishant Lodha, Product Marketing & Technical Marketing Manager, Marvell

Whether it is the aesthetics of the iPhone or a work of art like Monet’s ‘Water Lillies’, simplicity is often a very attractive trait. I hear this resonate in everyday examples from my own life – with my boss at work, whose mantra is “make it simple”, and my wife of 15 years telling my teenage daughter “beauty lies in simplicity”. For the record, both of these statements generally fall upon deaf ears.

The Non-Volatile Memory over PCIe Express (NVMe) technology that is now driving the progression of data storage is another place where the value of simplicity is starting to be recognized. In particular with the advent of the NVMe-over-Fabrics (NVMe-oF) topology that is just about to start seeing deployment. The simplest and most trusted of Ethernet fabrics, namely Transmission Control Protocol (TCP), has now been confirmed as an approved NVMe-oF standard by the NVMe Group[1].


Figure 1: All the NVMe fabrics currently available

Just to give a bit of background information here, NVMe basically enables the efficient utilization of flash-based Solid State Drives (SSDs) by accessing it over a high-speed interface, like PCIe, and using a streamlined command set that is specifically designed for flash implementations. Now, by definition, NVMe is limited to the confines of a single server, which presents a challenge when looking to scale out NVMe and access it from any element within the data center. This is where NVMe-oF comes in. All Flash Arrays (AFAs), Just a Bunch of Flash (JBOF) or Fabric-Attached Bunch of Flash (FBOF) and Software Defined Storage (SDS) architectures will each be able to incorporate a front end that has NVMe-oF connectivity as its foundation. As a result, the effectiveness with which servers, clients and applications are able to access external storage resources will be significantly enhanced.

A series of ‘fabrics’ have now emerged for scaling out NVMe. The first of these being Ethernet Remote Direct Memory Access (RDMA) – in both its RDMA over Converged Ethernet (RoCE) and Internet Wide-Area RDMA Protocol (iWARP) derivatives. It has been followed soon after by NVMe-over-Fiber Channel (FC-NVMe), and then ones based on FCoE, Infiniband and OmniPath.

But with so many fabric options already out there, why is it necessary to come up with another one? Do we really need NVMe-over-TCP (NVMe/TCP) too? Well RDMA (whether it is RoCE or iWARP) based NVMe fabrics are supposed to deliver the extremely low level latency that NVMe requires via a myriad of different technologies – like zero copy and kernel bypass – driven by specialized Network Interface Controller (NICs). However, there are several factors which hamper this, and these need to be taken into account.

  • Firstly, most of the earlier fabrics (like RoCE/iWARP) have no existing install base for storage networking to speak of (Fiber Channel is the only notable exception to this). For example, of the 12 million 10GbE+ NIC ports currently in operation within enterprise data centers, less than 5% have any RDMA capability (according to my quick back of the envelope calculations).
  • The most popular RDMA protocol (RoCE) mandates a lossless network on which to run (and this in turn requires highly skilled network engineers that command higher salaries). Even then, this protocol is prone to congestion problems, adding to further frustration.
  • Finally, and perhaps most telling, the two RDMA protocols (RoCE and iWARP) are mutually incompatible.

Unlike any other NVMe fabric, the pervasiveness of TCP is huge – it is absolutely everywhere. TCP/IP is the fundamental foundation of the Internet, every single Ethernet NIC/network out there supports the TCP protocol. With TCP, availability and reliability are just not issues to that need to be worried about. Extending the scale of NVMe over a TCP fabric seems like the logical thing to do.

NVMe/TCP is fast (especially if using Marvell FastLinQ 10/25/50/100GbE NICs – as they have a build-in full offload for NVMe/TCP), it leverages existing infrastructure and keeps things inherently simple. That is beautiful prospect for any technologist and is also attractive to company CIOs worried about budgets too.

So, once again, simplicity wins in the long run!

[1] https://nvmexpress.org/welcome-nvme-tcp-to-the-nvme-of-family-of-transports/

Comments are closed.