Network Processing Unit
Maximizing Performance with Processor and Core Affinity
Stateful devices perform multiple memory operations per packet (e.g., flow-state lookup, signature analysis, stateful modifications, counters, etc.), and these memory operations depend on previous packets. These memory operations in turn deal with a specific memory controller, location, and cache. If all processors access the same memory location, a severe bottleneck is created due to cache pollution and latency/interlock on the interconnect bus.
When designing a stateful packet processing deployment, device performance of Intel®-based chips can vary wildly due to the impact of QPI (QuickPath Interconnect) memory checks and local cache pollution.
To maximize performance, a policy control solution must maintain core affinity (this will guarantee processor affinity) – only in this way can policy control be applied in a “shared nothing” architecture and avoid latency-introducing memory references (whether across the QPI or to a local cache).
QuickPath Interconnect: Considerations in Packet Processing Applications
An issue of critical importance to stateful packet-processing applications, including deep-packet inspection (DPI) and network policy control, is Intel’s QuickPath Interconnect architecture.
Sandvine's Network Processing Unit
The only way to completely avoid QPI memory checks in a packet-processing application is to ensure that all packets associated with a flow, session, or subscriber, are processed by the same CPU. To achieve this result, two conditions must be met:
- There must be an aggregate solution to resolve network asymmetry by ensuring all packets relating to a particular flow, session, or subscriber go to the same packet-processing device
- The packet-processing device must include functionality that specifically directs associated packets to a common processor core
In Sandvine’s architecture, the first condition is met by the Policy Traffic Switch (PTS), the PCEF/TDF component of our solution. A comprehensive explanation of how the PTS overcomes routing asymmetry is available in the technology showcase Policy Traffic Switch Clusters: Overcoming Routing Asymmetry and Achieving Scale.
To meet the second condition, Sandvine has created a network processing unit (NPU). The NPU is the first point of examination for incoming packets, and is dedicated to maintaining flow, session, and subscriber affinity for maximum element throughput. To maintain processor affinity, the NPU ensures that all the packets corresponding to a flow, session, and subscriber are always processed by the same policy processing unit (PPU).
The ultimate result of the NPU is that all packets corresponding to a particular flow, session, or subscriber are delivered to a specific core:
- In order: the packets are never switched around
- Symmetrically: the core sees both directions of traffic flow
Maximizing Performance with Core and Processor Affinity
This paper details Sandvine's approach to ensuring all packets relating to a particular flow, session, and subscriber are presented in order and symmetrically to only one processing core.