It has been more than 5 yrs since
DPDK was launched and it has come a long way in delivering the performance. Breaking the myth that Linux kernel has the best networking stack, DPDK proved that – ‘Same hardware can handle 10-times more packets without Networking stack and handling the packet purely in the userspace!!
It took couple of years for NIC vendors to realize –
Improving their Kernel Driver is not going to match the performance anywhere near the DPDK performance.
- So came the Era of Developing and Releasing Poll Mode Drivers.
Mellanox was the most benefactor for this, The startup launched products to target OpenStack, NFV, VNF. The underlying products were DPDK supported high-performance User-space Driver (aka Poll Mode Drivers).
Post 2018 came a period where Think-tanks started discussing Packet Processing in the hardware itself.
Usecase: Offload OVS Rules to NIC hardware
Like offloading open-flow rules on to the hardware itself. Cavium was one of the early adopters into this but didn’t make much success. (Liquid IO & Liquid Security).
Crypto Processing, NATing and firewall rules being offloaded to the hardware.
So far, the best implementation is to offload open-flow rules.
Hence the new Design came in OVS-Offload(openvswitch-OVS). This feature was named as rte_flow (the programmer nomenclature) for VP/CTO’s it is “Flow-Director”.
WHAT IS RTE-FLOW IN DPDK?
RTE Stands for “Run Time Environment”. DPDK Core functions/APIs starts with ‘rte‘ prefix.
Rte-flow stands for dpdk defined way for Flow Representation. Rte-flow API’s/functions & structure objects can be used to program a pkt forwarding rule into NIC-hardware itself.
A Common and best use-case is programming an OpenFlow rule into the Hardware itself.
The Conversion of (open)flow rule into dpdk defined format is called rte-flow.
openflow-rule –(rule conversion logic )–>rte-flow
This dpdk-way of flow- representation is called rte-flow . An open-flow rule when converted into dpdk-defined flow-format, then this flow-data-structure is called rte-flow.
Once a rte-flow is formed it can be passed to any PMD to be processed and Programmed into the hardware.
OVS flow is consists of 2 data structures
- 1. const struct match *match (Key)
- 2. struct nlattr *nl_actions. (Action)
- Both Key and Action data is converted into intermediate data strucute called pattern
- Key parameters present in match data-structure is consumed by parse_flow_match function and output is pattern array. Each array index stores L2, L3, L4 parameters.
rte_flow_create function is PMD specific, When I say PMD specific it means, A rte flow is meant to be offloaded to a specific nic-card. The very NIC is represented/controlled via the specific Poll mode Driver. The same PMD will implement pmd_rte_flow_create
Given the brief introduction, this document deep dive’s into Design and Code.
- Any Dpdk application will use rte_eal_init (). This function will do all the memory initialization and Identify the Suitable PMD and initialize the Device and Configure Rx/Tx Ring buffers. To read more
- Any PMD Will implement a list of functions like:
- And Many More these are nothing but NIC card functions e.g.
Device_recv function will receive the packet from the hardware and put it in the rte_membuf pointer structure.
Same thing goes for the pkt transmit Device_transmit will transmit the packet buffer into the hardware.
Inorder to Make the code neat and simplify implementation.
Any application will simply Call rte_eth_tx_buffer.
rte_eth_tx_buffer(pkt) → Device_transmit(pkt)
Dpdk pkt transmit generic function.
In the reverse path it will be
rte_eth_rx_buffer () → Device_recv ()
To transmit multiple packets in a single go is
- rte_eth_tx_burst (pkt_arry [32/64])
- rte_eth_rx_burst (pkt_arry [32/64])
So far, we are discussing how PMD works and how pkts are received and transmitted.
Till this stage all packets are being handled in user space without any kernel Engagement. CPU takes all the load of pkt processing albeit the same workload processes 10x more packets than Linux kernel. Now to reduce pkt processing load on CPU device even further, We need Hardware Acceleration
Xilinx UltraScale, Napatech, Intel their NICs support Hardware Acceleration.
What kind of Pkt Processing can be Offloaded?
GTP/GRE header manipulation
MPLS label add/strip
CRC and frame-integrity check.
At Last: Offloading the OVS-Rules in Hardware itself.
Forget About Last point (Offloading the OVS-Rules in Hardware itself.). So How we Achieve 1-7 pkt processing.
One thing is clear we need to instruct the NIC of the operation and. NIC will Apply the rules/flow on each packet.
Something like this.