High Performance Networking with Holoscan¶
Authors: Alexis Girault (NVIDIA)
Supported platforms: x86_64, SBSA, IGX Orin (dGPU)
Last modified: March 17, 2025
Language: C++
Latest version: 0.1.0
Minimum Holoscan SDK version: 3.0.0
Tested Holoscan SDK versions: 3.0.0
Contribution metric: Level 1 - Highly Reliable
This tutorial demonstrates how to use the advanced networking Holoscan operator (often referred to as ANO or advanced_network
in HoloHub) for low latency and high throughput communication through NVIDIA SmartNICs. With a properly tuned system, the advanced network operator can achieve hundreds of Gbps with latencies in the low microseconds.
Note
This solution is designed for users who want to create a Holoscan application that will interface with an external system or sensor over Ethernet.
- For high performance communication with systems also running Holoscan, refer to the Holoscan distributed application documentation instead.
- For JESD-compliant sensor without Ethernet support, consider the Holoscan Sensor Bridge for an FPGA-based interface to Holoscan.
Prerequisites¶
Achieving High Performance Networking with Holoscan requires a system with an NVIDIA SmartNIC and a discrete GPU. That is the case of NVIDIA Data Center systems, or edge systems like the NVIDIA IGX platform and the NVIDIA Project DIGITS. x86_64
systems equipped with these components are also supported, though the performance will vary greatly depending on the PCIe topology of the system (more on this below).
In this tutorial, we will be developing on an NVIDIA IGX Orin platform with IGX SW 1.1 and an NVIDIA RTX 6000 ADA GPU, which is the configuration that is currently actively tested. The concepts should be applicable to other systems based on Ubuntu 22.04 as well. It should also work on other Linux distributions with a glibc version of 2.35 or higher by containerizing the dependencies and applications on top of an Ubuntu 22.04 image, but this is not actively tested at this time.
Secure boot conflict
If you have secure boot enabled on your system, you might need to disable it as a prerequisite to run some of the configurations below (switching the NIC link layers to Ethernet, updating the MRRS of your NIC ports, updating the BAR1 size of your GPU). Secure boot can be re-enabled after the configurations are completed.
Background¶
Achieving high performance networking is a complex problem that involves many system components and configurations which we will cover in this tutorial. Two of the core concepts to achieve this are named Kernel Bypass, and GPUDirect.
Kernel Bypass¶
In this context, Kernel Bypass refers to bypassing the operating system's kernel to directly communicate with the network interface (NIC), greatly reducing the latency and overhead of the Linux network stack. There are multiple technologies that achieve this in different fashions. They're all Ethernet-based, but differ in their implementation and features. The goal of the advanced_network
operator in Holoscan Networking is to provide a common higher-level interface to all these backends:
- RDMA: Remote Direct Memory Access, using the open-source
rdma-core
library. It differs from the other Ethernet-based backends with its server/client model and RoCE (RDMA over Ethernet) protocol. Given the extra cost and complexity to setup on both ends, it offers a simpler user interface, orders packets on arrival, and is the only one to offer a high reliability mode. - DPDK: the Data Plane Development Kit is an open-source project part of the Linux Foundation with a strong and long-lasting community support. Its RTE Flow capability is generally considered the most flexible solution to split packets ingress and egress data.
- DOCA GPUNetIO: This NVIDIA proprietary technology differs from the other backends by transmitting and receiving packets from the NIC using a GPU kernel instead of CPU code, which is highly beneficial for CPU-bound applications.
- NVIDIA Rivermax: NVIDIA's other proprietary kernel bypass technology. For a license fee, it should offer the lowest latency and lowest resource utilization for video streaming (RTP packets).
Work In Progress
The Holoscan Advanced Networking Operator integration testing infrastructure is under active development. As such:
- The DPDK backend is supported and distributed with the
holoscan-networking
package, and is the only backend actively tested at this time. - The DOCA GPUNetIO backend is supported and distributed with the
holoscan-networking
package, with testing infrastructure under development. - The NVIDIA Rivermax backend is supported for Rx only when building from source, but not yet distributed nor actively tested. Tx support is under development.
- The RDMA backend is under active development and should be available soon.
Which backend is best for your use case will depend on multiple factors, such as packet size, batch size, data type, and more. The goal of the Advanced Networking Operator is to abstract the interface to these backends, allowing developers to focus on the application logic and experiment with different configurations to identify the best technology for their use case.
GPUDirect¶
GPUDirect
allows the NIC to read and write data from/to a GPU without requiring to copy the data the system memory, decreasing CPU overheads and significantly reducing latency. An implementation of GPUDirect
is supported by all the kernel bypass backends listed above.
Warning
GPUDirect
is only supported on Workstation/Quadro/RTX GPUs and Data Center GPUs. It is not supported on GeForce cards.
How does that relate to peermem or dma-buf?
There are two interfaces to enable GPUDirect
:
- The
nvidia-peermem
kernel module, distributed with the NVIDIA DKMS GPU drivers.- Supported on Ubuntu kernels 5.4+, deprecated starting with kernel 6.8.
- Supported on NVIDIA optimized Linux kernels, including IGX OS and DGX OS.
- Supported by all MOFED drivers (requires rebuilding nvidia-dkms drivers afterwards).
DMA Buf
, supported on Linux kernels 5.12+ with NVIDIA open-source drivers 515+ and CUDA toolkit 11.7+.
1. Installing Holoscan Networking¶
We'll start with installing the holoscan-networking
package, as it provides some utilities to help tune the system, and requires some dependencies which will help us with the system setup.
First, add the DOCA apt repository which holds some of its dependencies:
export DOCA_URL="https://linux.mellanox.com/public/repo/doca/2.10.0/ubuntu22.04/arm64-sbsa/"
wget -qO- https://linux.mellanox.com/public/repo/doca/GPG-KEY-Mellanox.pub | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" | sudo tee /etc/apt/sources.list.d/doca.list > /dev/null
sudo apt update
export DOCA_URL="https://linux.mellanox.com/public/repo/doca/2.10.0/ubuntu22.04/arm64-sbsa/"
wget -qO- https://linux.mellanox.com/public/repo/doca/GPG-KEY-Mellanox.pub | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" | sudo tee /etc/apt/sources.list.d/doca.list > /dev/null
# Also need the CUDA repository for holoscan: https://developer.nvidia.com/cuda-downloads?target_os=Linux
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
export DOCA_URL="https://linux.mellanox.com/public/repo/doca/2.10.0/ubuntu22.04/x86_64/"
wget -qO- https://linux.mellanox.com/public/repo/doca/GPG-KEY-Mellanox.pub | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" | sudo tee /etc/apt/sources.list.d/doca.list > /dev/null
# Also need the CUDA repository for holoscan: https://developer.nvidia.com/cuda-downloads?target_os=Linux
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
You can then install holoscan-networking
:
sudo apt install -y holoscan-networking
You can build the Holoscan Networking libraries and sample applications from source on HoloHub:
git clone git@github.com:nvidia-holoscan/holohub.git
cd holohub
./dev_container build_and_install holoscan-networking # Installed in ./install
If you'd like to generate the debian package from source and install it to ensure all dependencies are then present on your system, you can run:
./dev_container build_and_package holoscan-networking
sudo apt-get install ./holoscan-networking_*.deb # Installed in /opt/nvidia/holoscan
Refer to the HoloHub README for more information.
2. Required System Setup¶
2.1 Check your NIC drivers¶
Ensure your NIC drivers are loaded:
lsmod | grep ib_core
See an example output
This would be an expected output, where ib_core
is listed on the left.
ib_core 442368 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx_compat 20480 11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
If this is empty, install the latest OFED drivers from DOCA (the DOCA APT repository should already be configured from the Holoscan Networking installation above), and reboot your system:
sudo apt update
sudo apt install doca-ofed
sudo reboot
Note
If this is not empty, you can still install the newest OFED drivers from doca-ofed
above. If you choose to keep your current drivers, install the following utilities for convenience later on. They include tools like ibstat
, ibv_devinfo
, ibdev2netdev
, mlxconfig
:
sudo apt update
sudo apt install infiniband-diags ibverbs-utils mlnx-ofed-kernel-utils mft
Also upgrade the user space libraries to make sure your tools have all the symbols they need:
sudo apt install libibverbs1 librdmacm1 rdma-core
Running ibstat
or ibv_devinfo
will confirm your NIC interfaces are recognized by your drivers.
2.2 Switch your NIC Link Layers to Ethernet¶
NVIDIA SmartNICs can function in two separate modes (called link layer):
- Ethernet (ETH)
- Infiniband (IB)
To identify the current mode, run ibstat
or ibv_devinfo
and look for the Link Layer
value.
ibv_devinfo
Couldn't load driver 'libmlx5-rdmav34.so'
If you see an error like this, you might have different versions for your OFED tools and libraries. Attempt after upgrading your user space libraries to match the version of your OFED tools like so:
sudo apt update
sudo apt install libibverbs1 librdmacm1 rdma-core
See an example output
In the example below, the mlx5_0
interface is in Ethernet mode, while the mlx5_1
interface is in Infiniband mode. Do not pay attention to the transport
value which is always InfiniBand
.
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 28.38.1002
node_guid: 48b0:2d03:00f4:07fb
sys_image_guid: 48b0:2d03:00f4:07fb
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: NVD0000000033
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 28.38.1002
node_guid: 48b0:2d03:00f4:07fc
sys_image_guid: 48b0:2d03:00f4:07fb
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: NVD0000000033
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
For Holoscan Networking, we want the NIC to use the ETH link layer. To switch the link layer mode, there are two possible options:
- On IGX Orin developer kits, you can switch that setting through the BIOS: see IGX Orin documentation.
-
On any system with a NVIDIA NIC (including the IGX Orin developer kits), you can run the commands below from a terminal:
-
Identify the PCI address of your NVIDIA NIC
nic_pci=$(sudo ibdev2netdev -v | awk '{print $1}' | head -n1)
# `0200` is the PCI-SIG class code for Ethernet controllers # `0207` is the PCI-SIG class code for Infiniband controllers # `15b3` is the Vendor ID for Mellanox nic_pci=$(lspci -n | awk '($2 == "0200:" || $2 == "0207:") && $3 ~ /^15b3:/ {print $1; exit}')
-
Set both link layers to Ethernet.
LINK_TYPE_P1
andLINK_TYPE_P2
are formlx5_0
andmlx5_1
respectively. You can choose to only set one of them.ETH
or2
is Ethernet mode, andIB
or1
is for InfiniBand.sudo mlxconfig -d $nic_pci set LINK_TYPE_P1=ETH LINK_TYPE_P2=ETH
Apply with
y
.See an example output
Device #1: ---------- Device type: ConnectX7 Name: P3740-B0-QSFP_Ax Description: NVIDIA Prometheus P3740 ConnectX-7 VPI PCIe Switch Motherboard; 400Gb/s; dual-port QSFP; PCIe switch5.0 X8 SLOT0 ;X16 SLOT2; secure boot; Device: 0005:03:00.0 Configurations: Next Boot New LINK_TYPE_P1 ETH(2) ETH(2) LINK_TYPE_P2 IB(1) ETH(2) Apply new Configuration? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.
Next Boot
is the current value that was expected to be used at the next reboot.New
is the value you're about to set to overrideNext Boot
.
ERROR: write counter to semaphore: Operation not permitted
Disable secure boot on your system ahead of changing the link type of your NIC ports. It can be re-enabled afterwards.
-
Reboot your system.
sudo reboot
-
2.3 Configure the IP addresses of the NIC ports¶
First, we want to identify the logical names of your NIC interfaces. Connecting an SFP cable in just one of the ports of the NIC will help you identify which port is which. Run the following command once the cable is in place:
ibdev2netdev
See an example output
In the example below, only mlx5_1
has a cable connected (Up
), and its logical ethernet name is eth1
:
$ ibdev2netdev
mlx5_0 port 1 ==> eth0 (Down)
mlx5_1 port 1 ==> eth1 (Up)
ibdev2netdev does not show the NIC
If you have a cable connected but it does not show Up/Down in the output of ibdev2netdev
, you can try to parse the output of dmesg
instead. The example below shows that 0005:03:00.1
is plugged, and that it is associated with eth1
:
$ sudo dmesg | grep -w mlx5_core
...
[ 11.512808] mlx5_core 0005:03:00.0 eth0: Link down
[ 11.640670] mlx5_core 0005:03:00.1 eth1: Link down
...
[ 3712.267103] mlx5_core 0005:03:00.1: Port module event: module 1, Cable plugged
The next step is to set a static IP on the interface you'd like to use so you can refer to it in your Holoscan applications. First, check if you already have any addresses configured using the ethernet interface names identified above (in our case, eth0
and eth1
):
ip -f inet addr show eth0
ip -f inet addr show eth1
If nothing appears, or you'd like to change the address, you can set an IP address through the Network Manager user interface, CLI (nmcli
), or other IP configuration tools. In the example below, we configure the eth0
interface with an address of 1.1.1.1/24
, and the eth1
interface with an address of 2.2.2.2/24
.
sudo ip addr add 1.1.1.1/24 dev eth0
sudo ip addr add 2.2.2.2/24 dev eth1
Set these variables to your desired values:
if_name=eth0
if_static_ip=1.1.1.1/24
Update the IP with nmcli
:
sudo nmcli connection modify $if_name ipv4.addresses $if_static_ip
sudo nmcli connection up $if_name
Create a network config file with the static IP:
cat << EOF | sudo tee /etc/systemd/network/20-$if_name.network
[Match]
MACAddress=$(cat /sys/class/net/$if_name/address)
[Network]
Address=$if_static_ip
EOF
Apply now:
sudo systemctl restart systemd-networkd
Note
If you are connecting the NIC to another NIC with an interconnect, do the same on the other system with an IP address on the same network segment.
For example, to communicate with 1.1.1.1/24
above (/24
-> 255.255.255.0
submask), setup your other system with an IP between 1.1.1.2
and 1.1.1.254
, and the same /24
submask.
2.4 Enable GPUDirect¶
Assuming you already have NVIDIA drivers installed, check if the nvidia_peermem
kernel module is loaded:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check topo
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check topo
See an example output
2025-03-12 14:15:07 - INFO - GPU 0: NVIDIA RTX A6000 has GPUDirect support.
2025-03-12 14:15:27 - INFO - nvidia-peermem module is loaded.
lsmod | grep nvidia_peermem
If it's not loaded, run the following command, then check again:
sudo modprobe nvidia_peermem
sudo echo "nvidia-peermem" >> /etc/modules
sudo systemctl restart systemd-modules-load.service
Error loading the nvidia-peermem
kernel module
If you run into an error loading the nvidia-peermem
kernel module, follow these steps:
- Install the
doca-ofed
package to get the latest drivers for your NIC as documented above. - Restart your system.
- Rebuild your NVIDIA drivers with DKMS like so:
peermem_ko=$(find /lib/modules/$(uname -r) -name "*peermem*.ko")
nv_dkms=$(dpkg -S "$peermem_ko" | cut -d: -f1)
sudo dpkg-reconfigure $nv_dkms
sudo modprobe nvidia_peermem
Why peermem and not dma buf?
peermem
is currently the only GPUDirect interface supported by all our networking backends. This section will therefore provide instructions for peermem
and not dma buf
.
3. Optimal system configurations¶
Advanced
The section below is for advanced users looking to extract more performance out of their system. You can choose to skip this section and return to it later if performance if your application is not satisfactory.
While the configurations above are the minimum requirements to get a NIC and a NVIDIA GPU to communicate while bypassing the OS kernel stack, performance can be further improved in most scenarios by tuning the system as described below.
Before diving in each of the setups below, we provide a utility script as part of the holoscan-networking
package which provides an overview of the configurations that potentially need to be tuned on your system.
Work In Progress
This utility script is under active development and will be updated in future releases with additional checks, more actionable recommendations, and automated tuning.
sudo /opt/nvidia/holoscan/bin/tune_system.py --check all
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check all
See an example output
Our tuned-up IGX system with A6000 can optimize most settings:
2025-03-12 14:16:06 - INFO - CPU 0: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 1: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 2: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 3: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 4: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 5: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 6: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 7: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 8: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 9: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 10: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - CPU 11: Governor is correctly set to 'performance'.
2025-03-12 14:16:06 - INFO - cx7_0/0005:03:00.0: MRRS is correctly set to 4096.
2025-03-12 14:16:06 - INFO - cx7_1/0005:03:00.1: MRRS is correctly set to 4096.
2025-03-12 14:16:06 - WARNING - cx7_0/0005:03:00.0: PCIe Max Payload Size is not set to 256 bytes. Found: 128 bytes.
2025-03-12 14:16:06 - WARNING - cx7_1/0005:03:00.1: PCIe Max Payload Size is not set to 256 bytes. Found: 128 bytes.
2025-03-12 14:16:06 - INFO - HugePages_Total: 3
2025-03-12 14:16:06 - INFO - HugePage Size: 1024.00 MB
2025-03-12 14:16:06 - INFO - Total Allocated HugePage Memory: 3072.00 MB
2025-03-12 14:16:06 - INFO - Hugepages are sufficiently allocated with at least 500 MB.
2025-03-12 14:16:06 - INFO - GPU 0: SM Clock is correctly set to 1920 MHz (within 500 of the 2100 MHz theoretical Max).
2025-03-12 14:16:06 - INFO - GPU 0: Memory Clock is correctly set to 8000 MHz.
2025-03-12 14:16:06 - INFO - GPU 00000005:09:00.0: BAR1 size is 8192 MiB.
2025-03-12 14:16:06 - INFO - GPU GPU0 has at least one PIX/PXB connection to a NIC
2025-03-12 14:16:06 - INFO - isolcpus found in kernel boot line
2025-03-12 14:16:06 - INFO - rcu_nocbs found in kernel boot line
2025-03-12 14:16:06 - INFO - irqaffinity found in kernel boot line
2025-03-12 14:16:06 - INFO - Interface cx7_0 has an acceptable MTU of 9000 bytes.
2025-03-12 14:16:06 - INFO - Interface cx7_1 has an acceptable MTU of 9000 bytes.
2025-03-12 14:16:06 - INFO - GPU 0: NVIDIA RTX A6000 has GPUDirect support.
2025-03-12 14:16:06 - INFO - nvidia-peermem module is loaded.
Based on the results, you can figure out which of the sections below are appropriate to update configurations on your system.
3.1 Ensure ideal PCIe topology¶
Kernel bypass and GPUDirect rely on PCIe to communicate between the GPU and the NIC at high speeds. As-such, the topology of the PCIe tree on a system is critical to ensure optimal performance.
Run the following command to check the GPUDirect communication matrix. You are looking for a PXB
or PIX
connection between the GPU and the NIC interfaces to get the best performance.
sudo /opt/nvidia/holoscan/bin/tune_system.py --check topo
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check topo
See an example output
On IGX developer kits, the board's internal switch is designed to connect the GPU to the NIC interfaces with a PXB
connection, offering great performance.
2025-03-06 12:07:45 - INFO - GPU GPU0 has at least one PIX/PXB connection to a NIC
nvidia-smi topo -mp
See an example output
On IGX developer kits, the board's internal switch is designed to connect the GPU to the NIC interfaces with a PXB
connection, offering great performance.
GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PXB PXB 0-11 0 N/A
NIC0 PXB X PIX
NIC1 PXB PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
If your connection is not optimal, you might be able to improve it by moving your NIC and/or GPU on a different PCIe port, so that they can share a branch and do not require going back to the Host Bridge (the CPU) to communicate. Refer to your system manufacturer for documentation, or run the following command to inspect the topology of your system:
lspci -tv
See an example output
Here is the PCIe tree of an IGX system. Note how the ConnectX-7 and RTX A6000 are connected to the same branch.
-+-[0007:00]---00.0-[01-ff]----00.0 Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller
+-[0005:00]---00.0-[01-ff]----00.0-[02-09]--+-00.0-[03]--+-00.0 Mellanox Technologies MT2910 Family [ConnectX-7]
| | \-00.1 Mellanox Technologies MT2910 Family [ConnectX-7]
| +-01.0-[04-06]----00.0-[05-06]----08.0-[06]--
| \-02.0-[07-09]----00.0-[08-09]----00.0-[09]--+-00.0 NVIDIA Corporation GA102GL [RTX A6000]
| \-00.1 NVIDIA Corporation GA102 High Definition Audio Controller
+-[0004:00]---00.0-[01-ff]----00.0 Sandisk Corp WD PC SN810 / Black SN850 NVMe SSD
+-[0001:00]---00.0-[01-ff]----00.0-[02-fc]--+-01.0-[03-34]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
| +-02.0-[35-66]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
| +-03.0-[67-98]----00.0 Device 1c00:3450
| +-04.0-[99-ca]----00.0-[9a]--+-00.0 ASPEED Technology, Inc. ASPEED Graphics Family
| | \-02.0 ASPEED Technology, Inc. Device 2603
| \-05.0-[cb-fc]----00.0 Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter
\-[0000:00]-
x86_64 compatibility
Most x86_64 systems are not designed for this topology as they lack a discrete PCIe switch. In that case, the best connection they can achieve is NODE
.
3.2 Check the NIC's PCIe configuration¶
Understanding PCIe Configuration for Maximum Performance - May 27, 2022
PCIe is used in any system for communication between different modules [including the NIC and the GPU]. This means that in order to process network traffic, the different devices communicating via the PCIe should be well configured. When connecting the network adapter to the PCIe, it auto-negotiates for the maximum capabilities supported between the network adapter and the CPU.
The instructions below are meant to understand if your system is able to extract the maximum capabilities of your NIC, but they're not configurable. The two values that we are looking at here are the Max Payload Size (MPS - the maximum size of a PCIe packet) and the Speed (or PCIe generation).
Max Payload Size (MPS)¶
sudo /opt/nvidia/holoscan/bin/tune_system.py --check mps
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check mps
See an example output
The PCIe configuration on the IGX Orin developer kit is not able to leverage the max payload size of the NIC:
2025-03-10 16:15:54 - WARNING - cx7_0/0005:03:00.0: PCIe Max Payload Size is not set to 256 bytes. Found: 128 bytes.
2025-03-10 16:15:54 - WARNING - cx7_1/0005:03:00.1: PCIe Max Payload Size is not set to 256 bytes. Found: 128 bytes.
Identify the PCIe address of your NVIDIA NIC:
nic_pci=$(sudo ibdev2netdev -v | awk '{print $1}' | head -n1)
# `0200` is the PCI-SIG class code for NICs
# `15b3` is the Vendor ID for Mellanox
nic_pci=$(lspci -n | awk '$2 == "0200:" && $3 ~ /^15b3:/ {print $1}' | head -n1)
Check current and max MPS:
sudo lspci -vv -s $nic_pci | awk '/DevCap/{s=1} /DevCtl/{s=0} /MaxPayload /{match($0, /MaxPayload [0-9]+/, m); if(s){print "Max " m[0]} else{print "Current " m[0]}}'
See an example output
The PCIe configuration on the IGX Orin developer kit is not able to leverage the max payload size of the NIC:
Max MaxPayload 512
Current MaxPayload 128
Note
While your NIC might be capable of more, 256 bytes is generally the largest supported by any switch/CPU at this time.
PCIe Speed/Generation¶
Identify the PCIe address of your NVIDIA NIC:
nic_pci=$(sudo ibdev2netdev -v | awk '{print $1}' | head -n1)
# `0200` is the PCI-SIG class code for NICs
# `15b3` is the Vendor ID for Mellanox
nic_pci=$(lspci -n | awk '$2 == "0200:" && $3 ~ /^15b3:/ {print $1}' | head -n1)
Check current and max Speeds:
sudo lspci -vv -s $nic_pci | awk '/LnkCap/{s=1} /LnkSta/{s=0} /Speed /{match($0, /Speed [0-9]+GT\/s/, m); if(s){print "Max " m[0]} else{print "Current " m[0]}}'
See an example output
On IGX, the switch is able to maximize the NIC speed, both being PCIe 5.0:
Max Speed 32GT/s
Current Speed 32GT/s
3.3 Maximize the NIC's Max Read Request Size (MRRS)¶
Understanding PCIe Configuration for Maximum Performance - May 27, 2022
PCIe Max Read Request determines the maximal PCIe read request allowed. A PCIe device usually keeps track of the number of pending read requests due to having to prepare buffers for an incoming response. The size of the PCIe max read request may affect the number of pending requests (when using data fetch larger than the PCIe MTU).
Unlike the PCIe properties queried in the previous section, the MRRS is configurable. We recommend maxing it to 4096 bytes. Run the following to check your current settings:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check mrrs
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check mrrs
Identify the PCIe address of your NVIDIA NIC:
nic_pci=$(sudo ibdev2netdev -v | awk '{print $1}' | head -n1)
# `0200` is the PCI-SIG class code for NICs
# `15b3` is the Vendor ID for Mellanox
nic_pci=$(lspci -n | awk '$2 == "0200:" && $3 ~ /^15b3:/ {print $1}' | head -n1)
Check current MRRS:
sudo lspci -vv -s $nic_pci | grep DevCtl: -A2 | grep -oE "MaxReadReq [0-9]+"
Update MRRS:
sudo /opt/nvidia/holoscan/bin/tune_system.py --set mrrs
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --set mrrs
Note
This value is reset on reboot and needs to be set every time the system boots
ERROR: pcilib: sysfs_write: write failed: Operation not permitted
Disable secure boot on your system ahead of changing the MRRS of your NIC ports. It can be re-enabled afterwards.
3.4 Enable Huge pages¶
Huge pages are a memory management feature that allows the OS to allocate large blocks of memory (typically 2MB or 1GB) instead of the default 4KB pages. This reduces the number of page table entries and the amount of memory used for translation, improving cache performance and reducing TLB (Translation Lookaside Buffer) misses, which leads to lower latencies.
While it is naturally beneficial for CPU packets, it is also needed when routing data packets to the GPU in order to handle metadata (mbufs) on the CPU.
We recommend installing the libhugetlbfs-bin
package for the hugeadm
utility:
sudo apt update
sudo apt install -y libhugetlbfs-bin
Then, check your huge page pools:
hugeadm --pool-list
See an example output
The example below shows that this system supports huge pages of 64K, 2M (default), 32M, and 1G, but that none of them are currently allocated.
Size Minimum Current Maximum Default
65536 0 0 0
2097152 0 0 0 *
33554432 0 0 0
1073741824 0 0 0
And your huge page mount points:
hugeadm --list-all-mounts
See an example output
The default huge pages are mounted on /dev/hugepages
with a page size of 2M.
Mount Point Options
/dev/hugepages rw,relatime,pagesize=2M
First, check your huge page pools:
ls -1 /sys/kernel/mm/hugepages/
grep Huge /proc/meminfo
See an example output
The example below shows that this system supports huge pages of 64K, 2M (default), 32M, and 1G, but that none of them are currently allocated.
hugepages-1048576kB
hugepages-2048kB
hugepages-32768kB
hugepages-64kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
And your huge page mount points:
mount | grep huge
See an example output
The default huge pages are mounted on /dev/hugepages
with a page size of 2M.
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
As a rule of thumb, we recommend to start with 3 to 4 GB of total huge pages, with an individual page size of 500 MB to 1 GB (per system availability).
There are two ways to allocate huge pages:
- in the kernel bootline (recommended to ensure contiguous memory allocation) or
- dynamically at runtime (risk of fragmentation for large page sizes)
The example below allocates 3 huge pages of 1GB each.
Add the flags below to the GRUB_CMDLINE_LINUX
variable in /etc/default/grub
:
default_hugepagesz=1G hugepagesz=1G hugepages=3
Show explanation
default_hugepagesz
: the default huge page size to use, making them available from the default mount point,/dev/hugepages
.hugepagesz
: the size of the huge pages to allocate.hugepages
: the number of huge pages to allocate.
Then rebuild your GRUB configuration and reboot:
sudo update-grub
sudo reboot
Allocate the 3x 1GB huge pages:
sudo hugeadm --pool-pages-min 1073741824:3
echo 3 | sudo tee /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
Create a mount point to access the 1GB huge pages pool since that is not the default size on that system. We will name it /mnt/huge
here.
sudo mkdir -p /mnt/huge
sudo mount -t hugetlbfs -o pagesize=1G none /mnt/huge
echo "nodev /mnt/huge hugetlbfs pagesize=1G 0 0" | sudo tee -a /etc/fstab
sudo mount /mnt/huge
Note
If you work with containers, remember to mount this directory in your container as well with -v /mnt/huge:/mnt/huge
.
Rerunning the initial commands should now list 3 hugepages of 1GB each. 1GB will be the default huge page size if updated in the kernel bootline only.
3.5 Isolate CPU cores¶
Note
This optimization is less impactful when using the gpunetio
backend since the GPU polls the NIC.
The CPU interacting with the NIC to route packets is sensitive to perturbations, especially with smaller packet/batch sizes requiring more frequent work. Isolating a CPU in Linux prevents unwanted user or kernel threads from running on it, reducing context switching and latency spikes from noisy neighbors.
We recommend isolating the CPU cores you will select to interact with the NIC (defined in the advanced_network
configuration described later in this tutorial). This is done by setting additional flags on the kernel bootline.
You can first check if any of the recommended flags were already set on the last boot:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check cmdline
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check cmdline
cat /proc/cmdline | grep -e isolcpus -e irqaffinity -e nohz_full -e rcu_nocbs -e rcu_nocb_poll
Decide which cores to isolate based on your configuration. We recommend one core per queue as a rule of thumb. First, identify your core IDs:
cat /proc/cpuinfo | grep processor
See an example output
This system has 12 cores, numbered 0 to 11:
processor # 0
processor # 1
processor # 2
processor # 3
processor # 4
processor # 5
processor # 6
processor # 7
processor # 8
processor # 9
processor # 10
processor # 11
As an example, the line below will isolate cores 9, 10 and 11, leaving cores 0-8 free for other tasks and hardware interrupts:
isolcpus=9-11 irqaffinity=0-8 nohz_full=9-11 rcu_nocbs=9-11 rcu_nocb_poll
Show explanation
Parameter | Description |
---|---|
isolcpus |
Isolates specific CPU cores from the Linux scheduler, preventing regular system tasks from running on them. This ensures dedicated cores are available exclusively for your networking tasks, reducing context switches and interruptions that can cause latency spikes. |
irqaffinity |
Controls which CPU cores can handle hardware interrupts. By directing network interrupts away from your isolated cores, you prevent networking tasks from being interrupted by hardware events, maintaining consistent processing time. |
nohz_full |
Disables regular kernel timer ticks on specified cores when they're running user space applications. This reduces overhead and prevents periodic interruptions, allowing your networking code to run with fewer disturbances. |
rcu_nocbs |
Offloads Read-Copy-Update (RCU) callback processing from specified cores. RCU is a synchronization mechanism in the Linux kernel that can cause periodic processing bursts. Moving this work away from your networking cores helps maintain consistent performance. |
rcu_nocb_poll |
Works with rcu_nocbs to improve how RCU callbacks are processed on non-callback CPUs. This can reduce latency spikes by changing how the kernel polls for RCU work. |
Together, these parameters create an environment where specific CPU cores can focus exclusively on network packet processing with minimal interference from the operating system, resulting in lower and more consistent latency.
Add these flags to the GRUB_CMDLINE_LINUX
variable in /etc/default/grub
, then rebuild your GRUB configuration and reboot:
sudo update-grub
sudo reboot
Verify that the flags were properly set after boot by rerunning the check commands above.
3.6 Prevent CPU cores from going idle¶
When a core goes idle/to sleep, coming back online to poll the NIC can cause latency spikes and dropped packets. To prevent this, we recommend setting the scaling governor to performance
for these CPU cores.
Note
Cores from a single cluster will always share the same governor.
Bug
We have witnessed instances where setting the governor to performance
on only the isolated cores (dedicated to polling the NIC) does not lead to the performance gains expected. As such, we currently recommend setting the governor to performance
for all cores which has shown to be reliably effective.
Check the current governor for each of your cores:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check cpu-freq
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check cpu-freq
See an example output
2025-03-06 12:20:27 - WARNING - CPU 0: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 1: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 2: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 3: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 4: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 5: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 6: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 7: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 8: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 9: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 10: Governor is set to 'powersave', not 'performance'.
2025-03-06 12:20:27 - WARNING - CPU 11: Governor is set to 'powersave', not 'performance'.
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
See an example output
In this example, all cores were defaulted to powersave
instead of the recommended performance
.
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
Install cpupower
to more conveniently set the governor:
sudo apt update
sudo apt install -y linux-tools-$(uname -r)
Set the governor to performance
for all cores:
sudo cpupower frequency-set -g performance
cat << EOF | sudo tee /etc/systemd/system/cpu-performance.service
[Unit]
Description=Set CPU governor to performance
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/bin/cpupower -c all frequency-set -g performance
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable cpu-performance.service
sudo systemctl start cpu-performance.service
Running the checks above should now list performance
as the governor for all cores. You can also run sudo cpupower -c all frequency-info
for more details.
3.7 Prevent the GPU from going idle¶
Similarly to the above, we want to maximize the GPU's clock speed and prevent it from going idle.
Run the following command to check your current clocks and whether they're locked (persistence mode):
nvidia-smi -q | grep -i "Persistence Mode"
nvidia-smi -q -d CLOCK
See an example output
Persistence Mode: Enabled
...
Attached GPUs : 1
GPU 00000005:09:00.0
Clocks
Graphics : 420 MHz
SM : 420 MHz
Memory : 405 MHz
Video : 1680 MHz
Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Default Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 8001 MHz
Video : 1950 MHz
...
To lock the GPU's clocks to their max values:
sudo nvidia-smi -pm 1
sudo nvidia-smi -lgc=$(nvidia-smi --query-gpu=clocks.max.sm --format=csv,noheader,nounits)
sudo nvidia-smi -lmc=$(nvidia-smi --query-gpu=clocks.max.mem --format=csv,noheader,nounits)
cat << EOF | sudo tee /etc/systemd/system/gpu-max-clocks.service
[Unit]
Description=Max GPU clocks
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
ExecStart=/bin/bash -c '/usr/bin/nvidia-smi --lock-gpu-clocks=$(/usr/bin/nvidia-smi --query-gpu=clocks.max.sm --format=csv,noheader,nounits)'
ExecStart=/bin/bash -c '/usr/bin/nvidia-smi --lock-memory-clocks=$(/usr/bin/nvidia-smi --query-gpu=clocks.max.mem --format=csv,noheader,nounits)'
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable gpu-max-clocks.service
sudo systemctl start gpu-max-clocks.service
Show explanation
This queries the max clocks for the GPU SM (clocks.max.sm
) and memory (clocks.max.mem
) and sets them to the current clocks (lock-gpu-clocks
and lock-memory-clocks
respectively). -pm 1
(or --persistence-mode=1
) enables persistence mode to lock these values.
See an example output
GPU clocks set to "(gpuClkMin 2100, gpuClkMax 2100)" for GPU 00000005:09:00.0
All done.
Memory clocks set to "(memClkMin 8001, memClkMax 8001)" for GPU 00000005:09:00.0
All done.
You can confirm that the clocks are set to the max values by running nvidia-smi -q -d CLOCK
again.
Note
Some max clocks might not be achievable in certain configurations, or due to boost clocks (SM) or rounding errors (Memory), despite the lock commands indicating it worked. For example - on IGX - the max non-boot SM clock will be 1920 MHz, and the max memory clock will show 8000 MHz, which are satisfying compared to the initial mode.
3.8 Maximize GPU BAR1 size¶
The GPU BAR1 memory is the primary resource consumed by GPUDirect
. It allows other PCIe devices (like the CPU and the NIC) to access the GPU's memory space. The larger the BAR1 size, the more memory the GPU can expose to these devices in a single PCIe transaction, reducing the number of transactions needed and improving performance.
We recommend a BAR1 size of 1GB or above. Check the current BAR1 size:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check bar1-size
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check bar1-size
See an example output
2025-03-06 12:22:53 - INFO - GPU 00000005:09:00.0: BAR1 size is 8192 MiB.
nvidia-smi -q | grep -A 3 BAR1
See an example output
For our RTX A6000, this shows a BAR1 size of 256 MiB:
BAR1 Memory Usage
Total : 256 MiB
Used : 13 MiB
Free : 243 MiB
Warning
Resizing the BAR1 size requires:
- A BIOS with resizable BAR support
- A GPU with physical resizable BAR
If you attempt to go forward with the instructions below without meeting the above requirements, you might render your GPU unusable.
BIOS Resizable BAR support¶
First, check if your system and BIOS support resizable BAR. Refer to your system's manufacturer documentation to access the BIOS. The Resizable BAR option is often categorized under Advanced > PCIe
settings. Enable this feature if found.
Note
The IGX Developer kit with IGX OS 1.1+ supports resizable BAR by default.
GPU Resizable BAR support¶
Next, you can check if your GPU has physical resizable BAR by running the following command:
sudo lspci -vv -s $(nvidia-smi --query-gpu=pci.bus_id --format=csv,noheader) | grep BAR
See an example output
This RTX A6000 has a resizable BAR1, currently set to 256 MiB:
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 256MB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB 64GB
BAR 3: current size: 32MB, supported: 32MB
If your GPU is listed on this page, you can download the Display Mode Selector
to resize the BAR1 to 8GB.
- Press
Join Now
. - Once approved, download the
Display Mode Selector
archive. - Unzip the archive.
- Access your system without a X-server running, either through SSH or a Virtual Console (
Alt+F1
). - Go down the right OS and architecture folder for your system (
linux/aarch64
orlinux/x64
). - Run the
displaymodeselector
command like so:
chmod +x displaymodeselector
sudo ./displaymodeselector --gpumode physical_display_enabled_8GB_bar1
Press y
to confirm you'd like to continue, then y
again to apply to all the eligible adapters.
See an example output
NVIDIA Display Mode Selector Utility (Version 1.67.0)
Copyright (C) 2015-2021, NVIDIA Corporation. All Rights Reserved.
WARNING: This operation updates the firmware on the board and could make
the device unusable if your host system lacks the necessary support.
Are you sure you want to continue?
Press 'y' to confirm (any other key to abort):
y
Specified GPU Mode "physical_display_enabled_8GB_bar1"
Update GPU Mode of all adapters to "physical_display_enabled_8GB_bar1"?
Press 'y' to confirm or 'n' to choose adapters or any other key to abort:
y
Updating GPU Mode of all eligible adapters to "physical_display_enabled_8GB_bar1"
Apply GPU Mode <6> corresponds to "physical_display_enabled_8GB_bar1"
Reading EEPROM (this operation may take up to 30 seconds)
[==================================================] 100 %
Reading EEPROM (this operation may take up to 30 seconds)
Successfully updated GPU mode to "physical_display_enabled_8GB_bar1" ( Mode 6 ).
A reboot is required for the update to take effect.
Error: unload the NVIDIA kernel driver first
If you see this error:
ERROR: In order to avoid the irreparable damage to your graphics adapter it is necessary to unload the NVIDIA kernel driver first:
rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia_peermem nvidia
Try to unload the NVIDIA kernel driver listed in the error message above (list may vary):
sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia_peermem nvidia
If this fails because the drivers are in use, stop the X-server first before trying again:
sudo systemctl isolate multi-user
/dev/mem: Operation not permitted. Access to physical memory denied
Disable secure boot on your system ahead of changing your GPU's BAR1 size. It can be re-enabled afterwards.
Reboot your system, and check the BAR1 size again to confirm the change.
sudo reboot
3.9 Enable Jumbo Frames¶
Jumbo frames are Ethernet frames that carry a payload larger than the standard 1500 bytes MTU (Maximum Transmission Unit). They can significantly improve network performance when transferring large amounts of data by reducing the overhead of packet headers and the number of packets that need to be processed.
We recommend an MTU of 9000 bytes on all interfaces involved in the data path. You can check the current MTU of your interfaces:
sudo /opt/nvidia/holoscan/bin/tune_system.py --check mtu
cd holohub
sudo ./operators/advanced_network/python/tune_system.py --check mtu
See an example output
2025-03-06 16:51:19 - INFO - Interface eth0 has an acceptable MTU of 9000 bytes.
2025-03-06 16:51:19 - INFO - Interface eth1 has an acceptable MTU of 9000 bytes.
For a given if_name
interface:
if_name=eth0
ip link show dev $if_name | grep -oE "mtu [0-9]+"
See an example output
mtu 1500
You can set the MTU for each interface like so, for a given if_name
name identified above:
sudo ip link set dev $if_name mtu 9000
sudo nmcli connection modify $if_name ipv4.mtu 9000
sudo nmcli connection up $if_name
Assuming you've set an IP address for the interface above, you can add the MTU to the interface's network configuration file like so:
sudo sed -i '/\[Network\]/a MTU=9000' /etc/systemd/network/20-$if_name.network
sudo systemctl restart systemd-networkd
Can I do more than 9000?
While your NIC might have a maximum MTU capability larger than 9000, we typically recommend setting the MTU to 9000 bytes, as that is the standard size for jumbo frames that's widely supported for compatibility with other network equipment. When using jumbo frames, all devices in the communication path must support the same MTU size. If any device in between has a smaller MTU, packets will be fragmented or dropped, potentially degrading performance.
Example with the CX-7 NIC:
$ ip -d link show dev $if_name | grep -oE "maxmtu [0-9]+"
maxmtu 9978
4. Running a test application¶
Holoscan Networking provides a benchmarking application named adv_networking_bench
that can be used to test the performance of the networking configuration. In this section, we'll walk you through the steps needed to configure the application for your NIC for Tx and Rx, and run a loopback test between the two interfaces with a physical SFP cable connecting them.
Make sure to install holoscan-networking
beforehand.
4.1 Update the loopback configuration¶
Find the application files¶
Identify the location of the adv_networking_bench
executable, and of the configuration file named adv_networking_bench_default_tx_rx.yaml
, for your installation:
Both located under /opt/nvidia/holoscan/examples/adv_networking_bench/
:
ls -1 /opt/nvidia/holoscan/examples/adv_networking_bench/
adv_networking_bench
adv_networking_bench_default_rx_multi_q.yaml
adv_networking_bench_default_tx_rx_hds.yaml
adv_networking_bench_default_tx_rx.yaml
adv_networking_bench_gpunetio_tx_rx.yaml
adv_networking_bench_rmax_rx.yaml
CMakeLists.txt
default_bench_op_rx.h
default_bench_op_tx.h
doca_bench_op_rx.h
doca_bench_op_tx.h
kernels.cu
kernels.cuh
main.cpp
Both located under ./install/examples/adv_networking_bench/
ls -1 ./install/examples/adv_networking_bench
adv_networking_bench
adv_networking_bench_default_rx_multi_q.yaml
adv_networking_bench_default_tx_rx_hds.yaml
adv_networking_bench_default_tx_rx.yaml
adv_networking_bench_gpunetio_tx_rx.yaml
adv_networking_bench.py
adv_networking_bench_rmax_rx.yaml
CMakeLists.txt
default_bench_op_rx.h
default_bench_op_tx.h
doca_bench_op_rx.h
doca_bench_op_tx.h
kernels.cu
kernels.cuh
main.cpp
Warning
The configuration file is also located alongide the application source code at applications/adv_networking_bench/adv_networking_bench_default_tx_rx.yaml
.
However, modifying this file will not affect the configuration used by the application executable without rebuilding the application.
For this reason, we recommend using the configuration file located in the install tree.
Note
The fields in this yaml
file will be explained in more details in a section below. For now, we'll stick to modifying the strict minimum required fields to run the application as-is on your system.
Identify your NIC's PCIe addresses¶
Retrieve the PCIe addresses of both ports of your NIC. We'll arbitrarily use the first for Tx and the second for Rx here:
sudo ibdev2netdev -v | awk '{print $1}'
# `0200` is the PCI-SIG class code for NICs
# `15b3` is the Vendor ID for Mellanox
lspci -n | awk '$2 == "0200:" && $3 ~ /^15b3:/ {print $1}'
See an example output
0005:03:00.0
0005:03:00.1
Configure the NIC for Tx and Rx¶
Set the NIC addresses in the interfaces
section of the advanced_network
section, making sure to remove the template brackets < >
. This configures your NIC independently of your application:
- Set the
address
field of thetx_port
interface to one of these addresses. That interface will be able to transmit ethernet packets. - Set the
address
field of therx_port
interface to the other address. This interface will be able to receive ethernet packets.
interfaces:
- name: "tx_port"
address: <0000:00:00.0> # The BUS address of the interface doing Tx
tx:
...
- name: "rx_port"
address: <0000:00:00.0> # The BUS address of the interface doing Rx
rx:
...
See an example yaml
interfaces:
- name: "tx_port"
address: 0005:03:00.0 # The BUS address of the interface doing Tx
tx:
...
- name: "rx_port"
address: 0005:03:00.1 # The BUS address of the interface doing Rx
rx:
...
Configure the application¶
Modify the bench_tx
section which configures the application itself, to create the packet headers and direct them to the NIC. Make sure to remove the template brackets < >
.
eth_dst_addr
with the MAC address (and not the PCIe address) of the NIC interface you want to use for Rx. You can get the MAC address of yourif_name
interface withcat /sys/class/net/$if_name/address
:- Replacing
address
with the PCIe address of the NIC interface you want to use for Tx (same astx_port
's address above).
bench_tx:
...
eth_dst_addr: <00:00:00:00:00:00> # Destination MAC address - required when Rx flow_isolation=true
ip_src_addr: <1.2.3.4> # Source IP address - required on layer 3 network
ip_dst_addr: <5.6.7.8> # Destination IP address - required on layer 3 network
udp_src_port: 4096 # UDP source port
udp_dst_port: 4096 # UDP destination port
address: <0000:00:00.0> # Source NIC Bus ID. Should match the address of the Tx interface above
See an example yaml
bench_tx:
...
eth_dst_addr: 48:b0:2d:ee:83:ad # Destination MAC address - required when Rx flow_isolation=true
ip_src_addr: <1.2.3.4> # Source IP address - required on layer 3 network
ip_dst_addr: <5.6.7.8> # Destination IP address - required on layer 3 network
udp_src_port: 4096 # UDP source port
udp_dst_port: 4096 # UDP destination port
address: 0005:03:00.0 # Source NIC Bus ID. Should match the address of the Tx interface above
Show explanation
eth_dst_addr
- the destination ethernet MAC address - will be embedded in the packet headers by the application. This is required here because the Rx interface above hasflow_isolation: true
(explained in more details below). In that configuration, only the packets listing the adequate destination MAC address will be accepted by the Rx interface.- We ignore the IP fields (
ip_src_addr
,ip_dst_addr
) for now, as we are testing on a layer 2 network by just connecting a cable between the two interfaces on our system, therefore having mock values has no impact. address
- the source PCIe address - needs to be defined again to tell the application itself to route the packets to the NIC interface we have configured previously for Tx.- You might have noted the lack of a
eth_src_addr
field in thebench_tx
section. This is because the source Ethernet MAC address can be inferred automatically from the PCIe address of the Tx interface (below).
4.2 Run the loopback test¶
After having modified the configuration file, ensure you have connected an SFP cable between the two interfaces of your NIC, then run the application with the command below:
sudo /opt/nvidia/holoscan/examples/adv_networking_bench/adv_networking_bench adv_networking_bench_default_tx_rx.yaml
This assumes you have the required dependencies (holoscan, doca, etc.) installed locally on your system.
sudo ./install/examples/adv_networking_bench/adv_networking_bench adv_networking_bench_default_tx_rx.yaml
./dev_container launch \
--img holohub:adv_networking_bench \
--docker_opts "-u 0 --privileged" \
-- bash -c "./install/examples/adv_networking_bench/adv_networking_bench adv_networking_bench_default_tx_rx.yaml"
The application will run indefinitely. You can stop it gracefully with Ctrl-C
. You can also uncomment and set the max_duration_ms
field in the scheduler
section of the configuration file to limit the duration of the run automatically.
See an example output
[info] [fragment.cpp:599] Loading extensions from configs...
[info] [gxf_executor.cpp:264] Creating context
[info] [main.cpp:35] Initializing advanced network operator
[info] [main.cpp:40] Using ANO manager dpdk
[info] [adv_network_rx.cpp:35] Adding output port bench_rx_out
[info] [adv_network_rx.cpp:51] AdvNetworkOpRx::initialize()
[info] [adv_network_common.h:607] Finished reading advanced network operator config
[info] [adv_network_dpdk_mgr.cpp:373] Attempting to use 2 ports for high-speed network
[info] [adv_network_dpdk_mgr.cpp:382] Setting DPDK log level to: Info
[info] [adv_network_dpdk_mgr.cpp:402] DPDK EAL arguments: adv_net_operator --file-prefix=nwlrbbmqbh -l 3,11,9 --log-level=9 --log-level=pmd.net.mlx5:info -a 0005:03:00.0,txq_inline_max=0,dv_flow_en=2 -a 0005:03:00.1,txq_inline_max=0,dv_flow_en=2
Log level 9 higher than maximum (8)
EAL: Detected CPU lcores: 12
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/nwlrbbmqbh/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: 1 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
EAL: Probe PCI driver: mlx5_pci (15b3:1021) device: 0005:03:00.0 (socket -1)
mlx5_net: PCI information matches for device "mlx5_0"
mlx5_net: enhanced MPS is enabled
mlx5_net: port 0 MAC address is 48:B0:2D:EE:83:AC
EAL: Probe PCI driver: mlx5_pci (15b3:1021) device: 0005:03:00.1 (socket -1)
mlx5_net: PCI information matches for device "mlx5_1"
mlx5_net: enhanced MPS is enabled
mlx5_net: port 1 MAC address is 48:B0:2D:EE:83:AD
TELEMETRY: No legacy callbacks, legacy socket not created
[info] [adv_network_dpdk_mgr.cpp:298] Port 0 has no RX queues. Creating dummy queue.
[info] [adv_network_dpdk_mgr.cpp:165] Adjusting buffer size to 9228 for headroom
[info] [adv_network_dpdk_mgr.cpp:165] Adjusting buffer size to 9128 for headroom
[info] [adv_network_dpdk_mgr.cpp:165] Adjusting buffer size to 9128 for headroom
[info] [adv_network_mgr.cpp:116] Registering memory regions
[info] [adv_network_mgr.cpp:178] Successfully allocated memory region MR_Unused_P0 at 0x100fa0000 type 2 with 9100 bytes (32768 elements @ 9228 bytes total 302383104)
[info] [adv_network_mgr.cpp:178] Successfully allocated memory region Data_RX_GPU at 0xffff4fc00000 type 3 with 9000 bytes (51200 elements @ 9128 bytes total 467402752)
[info] [adv_network_mgr.cpp:178] Successfully allocated memory region Data_TX_GPU at 0xffff33e00000 type 3 with 9000 bytes (51200 elements @ 9128 bytes total 467402752)
[info] [adv_network_mgr.cpp:191] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:223] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:223] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:193] Mapped external memory descriptor for 0xffff4fc00000 to device 0
[info] [adv_network_dpdk_mgr.cpp:193] Mapped external memory descriptor for 0xffff33e00000 to device 0
[info] [adv_network_dpdk_mgr.cpp:193] Mapped external memory descriptor for 0xffff4fc00000 to device 1
[info] [adv_network_dpdk_mgr.cpp:193] Mapped external memory descriptor for 0xffff33e00000 to device 1
[info] [adv_network_dpdk_mgr.cpp:454] DPDK init (0005:03:00.0) -- RX: ENABLED TX: ENABLED
[info] [adv_network_dpdk_mgr.cpp:464] Configuring RX queue: UNUSED_P0_Q0 (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:513] Created mempool RXP_P0_Q0_MR0 : mbufs=32768 elsize=9228 ptr=0x10041c380
[info] [adv_network_dpdk_mgr.cpp:523] Max packet size needed for RX: 9100
[info] [adv_network_dpdk_mgr.cpp:564] Configuring TX queue: ADC Samples (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:607] Created mempool TXP_P0_Q0_MR0 : mbufs=51200 elsize=9000 ptr=0x100c1fc00
[info] [adv_network_dpdk_mgr.cpp:621] Max packet size needed with TX: 9100
[info] [adv_network_dpdk_mgr.cpp:632] Setting port config for port 0 mtu:9082
[info] [adv_network_dpdk_mgr.cpp:663] Initializing port 0 with 1 RX queues and 1 TX queues...
mlx5_net: port 0 Tx queues number update: 0 -> 1
mlx5_net: port 0 Rx queues number update: 0 -> 1
[info] [adv_network_dpdk_mgr.cpp:679] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:689] Successfully set descriptors to 8192/8192
[info] [adv_network_dpdk_mgr.cpp:704] Port 0 not in isolation mode
[info] [adv_network_dpdk_mgr.cpp:713] Setting up port:0, queue:0, Num scatter:1 pool:0x10041c380
[info] [adv_network_dpdk_mgr.cpp:734] Successfully setup RX port 0 queue 0
[info] [adv_network_dpdk_mgr.cpp:756] Successfully set up TX queue 0/0
[info] [adv_network_dpdk_mgr.cpp:761] Enabling promiscuous mode for port 0
mlx5_net: [mlx5dr_cmd_query_caps]: Failed to query wire port regc value
mlx5_net: port 0 Rx queues number update: 1 -> 1
[info] [adv_network_dpdk_mgr.cpp:775] Successfully started port 0
[info] [adv_network_dpdk_mgr.cpp:778] Port 0, MAC address: 48:B0:2D:EE:83:AC
[info] [adv_network_dpdk_mgr.cpp:1111] Applying tx_eth_src offload for port 0
[info] [adv_network_dpdk_mgr.cpp:454] DPDK init (0005:03:00.1) -- RX: ENABLED TX: DISABLED
[info] [adv_network_dpdk_mgr.cpp:464] Configuring RX queue: Data (0) on port 1
[info] [adv_network_dpdk_mgr.cpp:513] Created mempool RXP_P1_Q0_MR0 : mbufs=51200 elsize=9128 ptr=0x125a5b940
[info] [adv_network_dpdk_mgr.cpp:523] Max packet size needed for RX: 9000
[info] [adv_network_dpdk_mgr.cpp:621] Max packet size needed with TX: 9000
[info] [adv_network_dpdk_mgr.cpp:632] Setting port config for port 1 mtu:8982
[info] [adv_network_dpdk_mgr.cpp:663] Initializing port 1 with 1 RX queues and 0 TX queues...
mlx5_net: port 1 Rx queues number update: 0 -> 1
[info] [adv_network_dpdk_mgr.cpp:679] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:689] Successfully set descriptors to 8192/8192
[info] [adv_network_dpdk_mgr.cpp:701] Port 1 in isolation mode
[info] [adv_network_dpdk_mgr.cpp:713] Setting up port:1, queue:0, Num scatter:1 pool:0x125a5b940
[info] [adv_network_dpdk_mgr.cpp:734] Successfully setup RX port 1 queue 0
[info] [adv_network_dpdk_mgr.cpp:764] Not enabling promiscuous mode on port 1 since flow isolation is enabled
mlx5_net: [mlx5dr_cmd_query_caps]: Failed to query wire port regc value
mlx5_net: port 1 Rx queues number update: 1 -> 1
[info] [adv_network_dpdk_mgr.cpp:775] Successfully started port 1
[info] [adv_network_dpdk_mgr.cpp:778] Port 1, MAC address: 48:B0:2D:EE:83:AD
[info] [adv_network_dpdk_mgr.cpp:790] Adding RX flow ADC Samples
[info] [adv_network_dpdk_mgr.cpp:998] Adding IPv4 length match for 1050
[info] [adv_network_dpdk_mgr.cpp:1018] Adding UDP port match for src/dst 4096/4096
[info] [adv_network_dpdk_mgr.cpp:814] Setting up RX burst pool with 8191 batches of size 81920
[info] [adv_network_dpdk_mgr.cpp:833] Setting up RX burst pool with 8191 batches of size 20480
[info] [adv_network_dpdk_mgr.cpp:875] Setting up TX ring TX_RING_P0_Q0
[info] [adv_network_dpdk_mgr.cpp:901] Setting up TX burst pool TX_BURST_POOL_P0_Q0 with 10240 pointers at 0x125a0d4c0
[info] [adv_network_dpdk_mgr.cpp:1186] Config validated successfully
[info] [adv_network_dpdk_mgr.cpp:1199] Starting advanced network workers
[info] [adv_network_dpdk_mgr.cpp:1278] Flushing packet on port 1
[info] [adv_network_dpdk_mgr.cpp:1478] Starting RX Core 9, port 1, queue 0, socket 0
[info] [adv_network_dpdk_mgr.cpp:1268] Done starting workers
[info] [default_bench_op_tx.h:79] AdvNetworkingBenchDefaultTxOp::initialize()
[info] [adv_network_dpdk_mgr.cpp:1637] Starting TX Core 11, port 0, queue 0 socket 0 using burst pool 0x125a0d4c0 ring 0x127690740
[info] [default_bench_op_tx.h:113] Initialized 4 streams and events
[info] [default_bench_op_tx.h:130] AdvNetworkingBenchDefaultTxOp::initialize() complete
[info] [default_bench_op_rx.h:67] AdvNetworkingBenchDefaultRxOp::initialize()
[info] [gxf_executor.cpp:1797] creating input IOSpec named 'burst_in'
[info] [default_bench_op_rx.h:104] AdvNetworkingBenchDefaultRxOp::initialize() complete
[info] [adv_network_tx.cpp:46] AdvNetworkOpTx::initialize()
[info] [gxf_executor.cpp:1797] creating input IOSpec named 'burst_in'
[info] [adv_network_common.h:607] Finished reading advanced network operator config
[info] [gxf_executor.cpp:2208] Activating Graph...
[info] [gxf_executor.cpp:2238] Running Graph...
[info] [multi_thread_scheduler.cpp:300] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:300] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [multi_thread_scheduler.cpp:300] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [gxf_executor.cpp:2240] Waiting for completion...
[info] [multi_thread_scheduler.cpp:300] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
[info] [multi_thread_scheduler.cpp:300] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
^C[info] [multi_thread_scheduler.cpp:636] Stopping multithread scheduler
[info] [multi_thread_scheduler.cpp:694] Stopping all async jobs
[info] [multi_thread_scheduler.cpp:218] Dispatcher thread has stopped checking jobs
[info] [multi_thread_scheduler.cpp:679] Waiting to join all async threads
[info] [multi_thread_scheduler.cpp:316] Worker Thread [pool name: default_pool, thread uid: 1] exiting.
[info] [multi_thread_scheduler.cpp:702] *********************** DISPATCHER EXEC TIME : 476345.364000 ms
[info] [multi_thread_scheduler.cpp:316] Worker Thread [pool name: default_pool, thread uid: 0] exiting.
[info] [multi_thread_scheduler.cpp:316] Worker Thread [pool name: default_pool, thread uid: 3] exiting.
[info] [multi_thread_scheduler.cpp:371] Event handler thread exiting.
[info] [multi_thread_scheduler.cpp:703] *********************** DISPATCHER WAIT TIME : 47339.961000 ms
[info] [multi_thread_scheduler.cpp:704] *********************** DISPATCHER COUNT : 197630449
[info] [multi_thread_scheduler.cpp:316] Worker Thread [pool name: default_pool, thread uid: 2] exiting.
[info] [multi_thread_scheduler.cpp:705] *********************** WORKER EXEC TIME : 983902.800000 ms
[info] [multi_thread_scheduler.cpp:706] *********************** WORKER WAIT TIME : 1634522.159000 ms
[info] [multi_thread_scheduler.cpp:707] *********************** WORKER COUNT : 11817369
[info] [multi_thread_scheduler.cpp:316] Worker Thread [pool name: default_pool, thread uid: 4] exiting.
[info] [multi_thread_scheduler.cpp:688] All async worker threads joined, deactivating all entities
[info] [adv_network_rx.cpp:46] AdvNetworkOpRx::stop()
[info] [adv_network_dpdk_mgr.cpp:1928] DPDK ANO shutdown called 2
[info] [adv_network_tx.cpp:41] AdvNetworkOpTx::stop()
[info] [adv_network_dpdk_mgr.cpp:1928] DPDK ANO shutdown called 1
[info] [adv_network_dpdk_mgr.cpp:1133] Port 0:
[info] [adv_network_dpdk_mgr.cpp:1135] - Received packets: 0
[info] [adv_network_dpdk_mgr.cpp:1136] - Transmit packets: 6005066864
[info] [adv_network_dpdk_mgr.cpp:1137] - Received bytes: 0
[info] [adv_network_dpdk_mgr.cpp:1138] - Transmit bytes: 6389391347584
[info] [adv_network_dpdk_mgr.cpp:1139] - Missed packets: 0
[info] [adv_network_dpdk_mgr.cpp:1140] - Errored packets: 0
[info] [adv_network_dpdk_mgr.cpp:1141] - RX out of buffers: 0
[info] [adv_network_dpdk_mgr.cpp:1143] ** Extended Stats **
[info] [adv_network_dpdk_mgr.cpp:1173] tx_good_packets: 6005070000
[info] [adv_network_dpdk_mgr.cpp:1173] tx_good_bytes: 6389394480000
[info] [adv_network_dpdk_mgr.cpp:1173] tx_q0_packets: 6005070000
[info] [adv_network_dpdk_mgr.cpp:1173] tx_q0_bytes: 6389394480000
[info] [adv_network_dpdk_mgr.cpp:1173] rx_multicast_bytes: 9589
[info] [adv_network_dpdk_mgr.cpp:1173] rx_multicast_packets: 22
[info] [adv_network_dpdk_mgr.cpp:1173] tx_unicast_bytes: 6389394480000
[info] [adv_network_dpdk_mgr.cpp:1173] tx_multicast_bytes: 9589
[info] [adv_network_dpdk_mgr.cpp:1173] tx_unicast_packets: 6005070000
[info] [adv_network_dpdk_mgr.cpp:1173] tx_multicast_packets: 22
[info] [adv_network_dpdk_mgr.cpp:1173] tx_phy_packets: 6005070022
[info] [adv_network_dpdk_mgr.cpp:1173] rx_phy_packets: 24
[info] [adv_network_dpdk_mgr.cpp:1173] tx_phy_bytes: 6413414769677
[info] [adv_network_dpdk_mgr.cpp:1173] rx_phy_bytes: 9805
[info] [adv_network_dpdk_mgr.cpp:1133] Port 1:
[info] [adv_network_dpdk_mgr.cpp:1135] - Received packets: 6004323692
[info] [adv_network_dpdk_mgr.cpp:1136] - Transmit packets: 0
[info] [adv_network_dpdk_mgr.cpp:1137] - Received bytes: 6388600255072
[info] [adv_network_dpdk_mgr.cpp:1138] - Transmit bytes: 0
[info] [adv_network_dpdk_mgr.cpp:1139] - Missed packets: 746308
[info] [adv_network_dpdk_mgr.cpp:1140] - Errored packets: 0
[info] [adv_network_dpdk_mgr.cpp:1141] - RX out of buffers: 5047027287
[info] [adv_network_dpdk_mgr.cpp:1143] ** Extended Stats **
[info] [adv_network_dpdk_mgr.cpp:1173] rx_good_packets: 6004323692
[info] [adv_network_dpdk_mgr.cpp:1173] rx_good_bytes: 6388600255072
[info] [adv_network_dpdk_mgr.cpp:1173] rx_missed_errors: 746308
[info] [adv_network_dpdk_mgr.cpp:1173] rx_mbuf_allocation_errors: 5047027287
[info] [adv_network_dpdk_mgr.cpp:1173] rx_q0_packets: 6004323692
[info] [adv_network_dpdk_mgr.cpp:1173] rx_q0_bytes: 6388600255072
[info] [adv_network_dpdk_mgr.cpp:1173] rx_q0_errors: 5047027287
[info] [adv_network_dpdk_mgr.cpp:1173] rx_unicast_bytes: 6389394480000
[info] [adv_network_dpdk_mgr.cpp:1173] rx_multicast_bytes: 9589
[info] [adv_network_dpdk_mgr.cpp:1173] rx_unicast_packets: 6005070000
[info] [adv_network_dpdk_mgr.cpp:1173] rx_multicast_packets: 22
[info] [adv_network_dpdk_mgr.cpp:1173] tx_multicast_bytes: 9589
[info] [adv_network_dpdk_mgr.cpp:1173] tx_multicast_packets: 22
[info] [adv_network_dpdk_mgr.cpp:1173] tx_phy_packets: 24
[info] [adv_network_dpdk_mgr.cpp:1173] rx_phy_packets: 6005070022
[info] [adv_network_dpdk_mgr.cpp:1173] tx_phy_bytes: 9805
[info] [adv_network_dpdk_mgr.cpp:1173] rx_phy_bytes: 6413414769677
[info] [adv_network_dpdk_mgr.cpp:1173] rx_out_of_buffer: 746308
[info] [adv_network_dpdk_mgr.cpp:1935] ANO DPDK manager shutting down
[info] [adv_network_dpdk_mgr.cpp:1622] Total packets received by application (port/queue 1/0): 6004323692
[info] [adv_network_dpdk_mgr.cpp:1698] Total packets transmitted by application (port/queue 0/0): 6005070000
[info] [multi_thread_scheduler.cpp:645] Multithread scheduler stopped.
[info] [multi_thread_scheduler.cpp:664] Multithread scheduler finished.
[info] [gxf_executor.cpp:2243] Deactivating Graph...
[info] [multi_thread_scheduler.cpp:491] TOTAL EXECUTION TIME OF SCHEDULER : 523694.460857 ms
[info] [gxf_executor.cpp:2251] Graph execution finished.
[info] [adv_network_dpdk_mgr.cpp:1928] DPDK ANO shutdown called 0
[info] [default_bench_op_tx.h:51] ANO benchmark TX op shutting down
[info] [default_bench_op_rx.h:56] Finished receiver with 6388570603520/6004295680 bytes/packets received and 0 packets dropped
[info] [default_bench_op_rx.h:61] ANO benchmark RX op shutting down
[info] [default_bench_op_rx.h:108] AdvNetworkingBenchDefaultRxOp::freeResources() start
[info] [default_bench_op_rx.h:116] AdvNetworkingBenchDefaultRxOp::freeResources() complete
[info] [gxf_executor.cpp:294] Destroying context
To inspect the speed the data is moving through the NIC, run mlnx_perf
on one of the interfaces in a separate terminal, concurrently with the application running:
sudo mlnx_perf -i $if_name
See an example output
On IGX with RTX A6000, we are able to hit close to the 100 Gbps linerate with this configuration:
rx_vport_unicast_packets: 11,614,900
rx_vport_unicast_bytes: 12,358,253,600 Bps = 98,866.2 Mbps
rx_packets_phy: 11,614,847
rx_bytes_phy: 12,404,657,664 Bps = 99,237.26 Mbps
rx_1024_to_1518_bytes_phy: 11,614,936
rx_prio0_bytes: 12,404,738,832 Bps = 99,237.91 Mbps
rx_prio0_packets: 11,614,923
Troubleshooting
EAL: failed to parse device
Make sure to set valid PCIe addresses in the address
fields in interfaces
, per instructions above.
Invalid MAC address format
Make sure to set a valid MAC address in the eth_dst_addr
field in bench_tx
, per instructions above.
mlx5_common: Fail to create MR for address [...] Could not DMA map EXT memory
Example error:
mlx5_common: Fail to create MR for address (0xffff2fc00000)
mlx5_common: Device 0005:03:00.0 unable to DMA map
[critical] [adv_network_dpdk_mgr.cpp:188] Could not DMA map EXT memory: -1 err=Invalid argument
[critical] [adv_network_dpdk_mgr.cpp:430] Failed to map MRs
EAL: Couldn't get fd on hugepage file [..] error allocating rte services array
Example error:
EAL: get_seg_fd(): open '/mnt/huge/nwlrbbmqbhmap_0' failed: Permission denied
EAL: Couldn't get fd on hugepage file
EAL: error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
Ensure you run as root, using sudo
.
EAL: Cannot get hugepage information.
EAL: x hugepages of size x reserved, no mounted hugetlbfs found for that size
Ensure your hugepages are mounted.
EAL: No free x kB hugepages reported on node 0
- Ensure you have allocated hugepages.
-
If you have already, check if they are any free left with
grep Huge /proc/meminfo
.See an example output
No more space here!
HugePages_Total: 2 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB
-
If not, you can delete dangling hugepages under your hugepage mount point. That happens when your previous application run crashes.
sudo rm -rf /dev/hugepages/* # default mount point sudo rm -rf /mnt/huge/* # custom mount point
Could not allocate x MB of GPU memory [...] Failed to allocate GPU memory
Check your GPU utilization:
nvidia-smi pmon -c 1
You might need to kill some of the listed processes to free up GPU VRAM.
5. Building your own application¶
This section will guide you through building your own application using the adv_networking_bench
as an example. Make sure to install holoscan-networking
first.
5.1 Understand the configuration parameters¶
Note
The configuration below will be analyzed in the context of the application consuming it, as defined in the main.cpp
file. You can look it up when the "sample application code" is referenced.
/opt/nvidia/holoscan/examples/adv_networking_bench/main.cpp
./applications/adv_networking_bench/cpp/main.cpp
If you are not yet familiar with how Holoscan applications are constructed, please refer to the Holoscan SDK documentation first.
Let's look at the adv_networking_bench_default_tx_rx.yaml
file below. Click on the icons below to expand explanations for each annotated line.
scheduler:
check_recession_period_ms: 0
worker_thread_number: 5
stop_on_deadlock: true
stop_on_deadlock_timeout: 500
# max_duration_ms: 20000
advanced_network:
cfg:
version: 1
manager: "dpdk"
master_core: 3
debug: false
log_level: "info"
memory_regions:
- name: "Data_TX_GPU"
kind: "device"
affinity: 0
num_bufs: 51200
buf_size: 1064
- name: "Data_RX_GPU"
kind: "device"
affinity: 0
num_bufs: 51200
buf_size: 1000
- name: "Data_RX_CPU"
kind: "huge"
affinity: 0
num_bufs: 51200
buf_size: 64
interfaces:
- name: "tx_port"
address: <0000:00:00.0>
tx:
queues:
- name: "tx_q_0"
id: 0
batch_size: 10240
cpu_core: 11
memory_regions:
- "Data_TX_GPU"
offloads:
- "tx_eth_src"
- name: "rx_port"
address: <0000:00:00.0>
rx:
flow_isolation: true
queues:
- name: "rx_q_0"
id: 0
cpu_core: 9
batch_size: 10240
output_port: "bench_rx_out"
memory_regions:
- "Data_RX_CPU"
- "Data_RX_GPU"
flows:
- name: "flow_0"
id: 0
action:
type: queue
id: 0
match:
udp_src: 4096
udp_dst: 4096
ipv4_len: 1050
bench_rx:
gpu_direct: true # Set to true if using a GPU region for the Rx queues.
split_boundary: true # Whether header and data are split for Rx (Header to CPU)
batch_size: 10240
max_packet_size: 1064
header_size: 64
bench_tx:
gpu_direct: true # Set to true if using a GPU region for the Tx queues.
split_boundary: 0 # Byte boundary where header and data are split for Tx, 0 if no split
batch_size: 10240
payload_size: 1000
header_size: 64
eth_dst_addr: <00:00:00:00:00:00> # Destination MAC address - required when Rx flow_isolation=true
ip_src_addr: <1.2.3.4> # Source IP address - required on layer 3 network
ip_dst_addr: <5.6.7.8> # Destination IP address - required on layer 3 network
udp_src_port: 4096 # UDP source port
udp_dst_port: 4096 # UDP destination port
address: <0000:00:00.0> # Source NIC Bus ID. Should match the address of the Tx interface above
5.2 Create your own Rx operator¶
Under construction
This section is under construction. Refer to the implementation of the AdvNetworkingBenchDefaultRxOp
for an example.
/opt/nvidia/holoscan/examples/adv_networking_bench/default_bench_op_rx.h
./applications/adv_networking_bench/cpp/default_bench_op_rx.h
Note
Design investigations are expected soon for a generic packet aggregator operator.
5.3 Create your own Tx operator¶
Under construction
This section is under construction. Refer to the implementation of the AdvNetworkingBenchDefaultTxOp
for an example.
/opt/nvidia/holoscan/examples/adv_networking_bench/default_bench_op_tx.h
./applications/adv_networking_bench/cpp/default_bench_op_tx.h
Note
Designs investigations are expected soon for a generic way to prepare packets to send to the NIC.
5.4 Build with CMake¶
- Create a source directory and write your source file(s) for your application (and custom operators if needed)
-
Create a
CMakeLists.txt
file in your source directory like this one:cmake_minimum_required(VERSION 3.20) project(my_app CXX) # Add CUDA if writing .cu kernels find_package(holoscan 2.6 REQUIRED CONFIG PATHS "/opt/nvidia/holoscan") find_package(holoscan-networking REQUIRED CONFIG PATHS "/opt/nvidia/holoscan") # Create an executable add_executable(my_app my_app.cpp ... ) target_include_directories(my_app PRIVATE my_include_dirs/ ... ) target_link_libraries(my_app PRIVATE holoscan::core holoscan::ops::advanced_network_rx holoscan::ops::advanced_network_tx my_other_dependencies ... ) # Copy the config file to the build directory for convenience referring to it add_custom_target(my_app_config_yaml COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_CURRENT_SOURCE_DIR}/my_app_config.yaml" ${CMAKE_CURRENT_BINARY_DIR} DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/my_app_config.yaml" ) add_dependencies(my_app my_app_config_yaml)
-
Build your application like so:
# Your chosen paths src_dir="." build_dir="build" # Configure the build cmake -S "$src_dir" -B "$build_dir" # Build the application cmake --build "$build_dir" -j
Failed to detect a default CUDA architecture.
Add the path to your installation of
nvcc
to yourPATH
, or pass its to the cmake configuration command like so (adjust to your CUDA/nvcc installation path):cmake -S "$src_dir" -B "$build_dir" -D CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
-
Run your application like so:
"./$build_dir/my_app my_app_config.yaml"
- Create an application directory under
applications/
in your clone of the HoloHub repository, and write your source file(s) for your application (and custom operators if needed). -
Add the following to the
application/CMakeLists.txt
file:add_holohub_application(my_app DEPENDS OPERATORS advanced_network)
-
Create a
CMakeLists.txt
file in your application directory like this one:cmake_minimum_required(VERSION 3.20) project(my_app CXX) # Add CUDA if writing .cu kernels find_package(holoscan 2.6 REQUIRED CONFIG PATHS "/opt/nvidia/holoscan") # Create an executable add_executable(my_app my_app.cpp ... ) target_include_directories(my_app PRIVATE my_include_dirs/ ... ) target_link_libraries(my_app PRIVATE holoscan::core holoscan::ops::advanced_network_rx holoscan::ops::advanced_network_tx my_other_dependencies ... ) # Copy the config file to the build directory for convenience referring to it add_custom_target(my_app_config_yaml COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_CURRENT_SOURCE_DIR}/my_app_config.yaml" ${CMAKE_CURRENT_BINARY_DIR} DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/my_app_config.yaml" ) add_dependencies(my_app my_app_config_yaml)
-
Build your application like so:
./dev_container build_and_run my_app --no_run
-
Run your application like so:
./dev_container launch --img holohub:my_app --docker_opts "-u 0 --privileged" --bash -c "./build/my_app/applications/my_app my_app_config.yaml"
or, if you have set up a shortcut to run your application with its config file through its
metadata.json
(see other apps for examples):./dev_container build_and_run --no_build --container_args " -u 0 --privileged"