Sleepless nights and headaches over the possibility of making a bad investment decision ends here. When it comes to investing and building your network, we've got your back.
You’re spoiled for choice as an operator with plenty of commercial off-the-shelf (COTS) prodcuts to choose from. But what servers should you actually buy for your network functions virtualization infrastructure? Do you feel like an expert? Are the COTS vendors experts? They’ve been involved in building cloud infrastructure for giants like Amazon and Facebook so they should know, right?
NFV is a different story. For data plane applications, which is our main focus here, the architecture is very different (control plane is really no different than your average web application for selling toasters, so we’ll leave that for now). The amount of network input/output (I/O) to process causes this differentiation, so if you’re not careful about your NFVI investment, you’ll be spending a lot of money on CPU cycles and memory that’s sitting there, twiddling its thumbs. Anecdotally, we have operators paying $20,000 to $50,000 per server, with enough CPU and memory to make super computers blush; these operators are connecting just 2 x 10GE or 2 x 40GE of I/O to it. That’s what inspired us to write this blog piece – bad investment decisions.
The two biggest cost components of a server today are CPU and memory. You want to make sure you get the right amount, but what’s a winning recipe? The amount of traffic in operator networks is increasing rapidly (call it +50% year on year), just like before – but the amount of CPU per-gigabit-processing required is not increasing at all and is unlikely to ever do so. In fact, even over the lifespan of the COTS you are investing in, CPU per-gigabit-processing is likely to decrease as software is optimized.
So get your mix right from the start, and you should be okay. Still think it’s easy?
But there are so many choices!
Yes, and there’s more than one trap you can fall into here. The important thing is that you understand what you’re choosing and why. But we do the leg-work so you don’t need to.
The first thing we’ll look at is the choice of virtual switch and how that affects your application performance. We've been looking at Windriver AVS, Juniper Contrail vRouter and OVS-DPDK. The switch you choose will impact feature availability, performance and ultimately, total cost of ownership (TCO). Our math shows that at least 30% of your CPU will be required for the virtual switching platform, no matter your choice.
In the tables above, we’ve tested a typical operator traffic mix model, which is always going to be different from your average packet blasting test. The numbers come from an exhaustive test, where we tried to get the most out of a single server, by allocating the best balanced mix between CPUs allocated for the vSwitch and our VNF. Juniper Contrail’s vRouter suffers significantly from the fact that the traffic includes a realistic flow setup rate. OVS-DPDK and AVS is not affected by this, which is due to their more stateless design. On the flipside, vRouter gives you service function chaining out of the box, which may be why it needs per flow state.
Other vSwitch Considerations
The AVS is easy to setup, works out of the box and does what it should – impressively simple. But if you spend some time with OVS-DPDK, it will perform better than the AVS. With some time and tweaking, you’ll find additional mechanisms to get increased performance, more control of the resources assigned to the switch, and the ability to customize the behavior of the OVS for your NFV application. All of this, together with deep troubleshooting, statistics capabilities, and the power of the open-source community makes your time with OVS-DPDK well spent. Just don’t underestimate the complexity or the time needed to learn OVS-DPDK setup; it’s significant.
Most of the differences between the vSwitches are much harder to quantify and explain than the pure performance. First of all, most of the feature difference is neutralized when you go to an OpenStack based system, because OpenStack and Neutron does not know what the features are or how to configure them. It also means that you need to have a good idea on how to configure your tenant networks, or how to get the data plane traffic to your VNF in the most efficient way. Our first conclusion is that the more functionality there is, the less performance it has (which is evident from the vRouter results above). And as we’ve mentioned before, "as stupid as possible is preferable when it comes to switches. We’ll show you."
VNF under test
In our testing, we’ve run one of the most sophisticated and feature-rich VNFs available – the Sandvine data plane VNF. It should serve well as your benchmark. Will every data plane VNF perform exactly the same? No, there are hundreds of factors that affect performance: the choice of CPU, amount and speed of your memory, the choice of NIC and vSwitch, the traffic model and how it’s balanced in the NIC and across the switches, the performance and NUMA model and threading model of your VNF, number of queues, queue lengths and burst-y behavior of the traffic and the application, etc. Hence, we have to simplify our model, but simplify it in a way that applies to most operators NFV situation.
CPU — Shop for cycles and frequency
You want the highest CPU cycles per dollar spent you can find, but you will also find that most VNFs will do better with a higher CPU frequency rather than with the maximum number of cores. CPU manufacturers tend to increase the price exponentially, both at the maximum CPU frequency and the CPU cores dimension, and you certainly don’t want to be at the extreme end of either of the two.
Is it worth it going to the newest CPU generation? Probably, more so for your electricity bill than for better cost performance. Since data plane VNF performance is so I/O and memory bound, it’s rare that you will benefit as much as other applications from the features of the new architecture. But it does happen, especially when the new CPU architecture comes with changes to the memory infrastructure, like the upgrade to Skylake (Xeon Scalable) from Haswell (https://www.sandvine.com/press-releases/blog/sandvine-virtual-series-architecture-delivers-60-more-packet-processing-power-with-intel-xeon-scalable-processor).
New CPU architectures and the servers they come on tend to be very expensive in the beginning of the cycle. Just remember, don't break the bank. At the end of the day, you want the most cycles per dollar you can get, with a slight emphasis on higher clock cycle CPUs.
Network I/O — The bigger the better
The cost to carry traffic over 100G ports is far less than 10G ports these days, everything included. You may still want to build your servers with a few 10G ports for out of band management and other low-capacity network requirements, but for the everything carrying data plane, you want to go 100G today. Some of the reasons for this are not obvious.
The 10G NIC chips are old and very thin on features; 100G NIC chips are designed to help the CPU process much higher throughput rates. One of the most important features here is receiving and balancing the packets into several RX queues, which in turn helps the VNF parallel the processing, assigning a CPU thread per RX queue. Some 10G NICs have such features too (RSS Scaling), but they are designed for end point applications such as a web server, not for NFV payloads. So the cost for processing a gigabit on a 100G NIC is substantially lower than the same gigabit on a 10G NIC.
Some of these 100G NIC load balancing features are non-trivial to get to work when your solution is based on OpenStack and Neutron, so expect some effort to go into this piece of engineering before you’re done.
Memory — The more the merrier
If there’s an empty memory slot, fill it! If you leave slots unused, it means not all the memory controllers on the CPU will be used – and you’ll be losing memory performance. It is likely that your VNF application will want as much memory performance as you can give it, but that doesn’t mean you should buy the biggest dual in-line memory modules (DIMMs); they just drive up the cost for no good reason. A simple router will hardly use any memory, while a sophisticated DPI engine will be very hungry.
Data plane VNFs come in different shapes but 512GB is more than enough memory today.
Does a data plane VNF care about storage performance?
Well, it could. The strictest requirements of storage performance come from analytics and/or statistics applications. So, if your data plane VNF is capturing packets and storing them to disk, or something along those lines, it cares at least a little. The numbers will depend on data type, data base, disk write block size and so on, but the general requirement is to use the Cinder volume attached as a separate virtual disk. We would advise you to do your high performance and high capacity storage off the server or blade, using SAN technology. This will make migration, dimensioning, and scale-in/out much easier further down the road.
The benefits of cloud are many and well known to most, but like traveling to Mars, it’s not free to get there. Skimping on NIC ports is not the way to get there cheaper. Our recommendation is that you dimension for at least 200Gbps of throughput for a high end, dual socket server. But that Network I/O could come through a large number of different ports, so go with a minimum of 4 x 100GE ports for data plane and a minimum of 4 x 10GE for management connectivity.
Whatever you choose, do not spend more than $10,000 per each of these servers; if you do, you’re doing it wrong.