Release Benchmarking Guide #

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: July 1, 2025
Latest version: 0.1.0
Minimum Holoscan SDK version: 2.3.0
Tested Holoscan SDK versions: 2.3.0
Contribution metric: Level 3 - Developmental

This tutorial provides a reproducible workflow for developers to accurately measure the latency of curated HoloHub applications across various SDK releases and different deployment scenarios, from single-application to multi-model use cases.

Developers can use the Holoscan Flow Benchmarking tools referenced within this guide to systematically analyze performance bottlenecks, optimize execution times, and fine-tune their own applications for real-time, low-latency processing.

Contents#

Background
Previous Holoscan Release Benchmark Reports
Running the Tutorial
Running Benchmarks
Summarizing Data
Presenting Data
Troubleshooting
Developer References

Background#

Holoscan SDK emphasizes low end-to-end latency in application pipelines. In addition to other benchmarks, we can use HoloHub applications to evaluate Holoscan SDK performance over releases.

In this tutorial we provide a reproducible workflow to evaluate end-to-end latency performance on the Endoscopy Tool Tracking and Multi-AI Ultrasound HoloHub projects. These projects are generally maintained by the NVIDIA Holoscan team and demonstrate baseline Holoscan SDK inference pipelines with video replay and Holoviz rendering output.

Benchmark scenarios include: - Running multiple Holoscan SDK pipelines concurrently on a single machine - Running video replay input at real-time speeds or as fast as possible - Running Holoviz output with either visual rendering or in headless mode

We plan to release HoloHub benchmarks in the release subfolder following Holoscan SDK general releases. You can follow the tutorial below to similarly evaluate performance on your own machine.

Refer to related documents for more information: - the results report template file provides additional information on definitions and background - versioned releases are available for review in the release subfolder

Previous Holoscan Release Benchmark Reports#

Running Benchmarks: Getting Started#

Data collection can be run in the HoloHub base container for both the Endoscopy Tool Tracking and the Multi-AI Ultrasound applications. We've provided a custom Dockerfile with tools to process collected data into a benchmark report.

# Build and launch the container
./holohub run-container \
    --img holohub:release_benchmarking \
    --docker-file benchmarks/release_benchmarking/Dockerfile \
    --base-img nvcr.io/nvidia/clara-holoscan/holoscan:<holoscan-sdk-version-gpu>

# Inside the container, build the applications in benchmarking mode
./holohub build endoscopy_tool_tracking --benchmark --language=cpp
./holohub build multiai_ultrasound --benchmark --language=cpp

./holohub build release_benchmarking

Run the benchmarking script with no arguments to collect performance logs in the ./output directory.

./holohub run release_benchmarking --no-local-build

Summarizing Data#

After running benchmarks, inside the dev environment, use ./holohub run to process data statistics and create bar plot PNGs:

./holohub run-container --img holohub:release_benchmarking --no-docker-build
./holohub run release_benchmarking --no-local-build --run-args "--process benchmarks/release_benchmarking"

Alternatively, collect results across platforms. On each machine: 1. Run benchmarks:

./holohub run release_benchmarking --no-local-build

2. Add platform configuration information:

./holohub run release_benchmarking --no-local-build --run-args "--print" > benchmarks/release_benchmarking/output/platform.txt

3. Transfer output contents from each platform to a single machine:

# Compress information for transfer
pushd benchmarks/release_benchmarking
tar cvf benchmarks-<platform-name>.tar.gz output/*

# Migrate the results archive with your transfer tool of choice, such as SCP

# Extract results to a subfolder on the target machine
mkdir -p output/<release>/<platform-name>/
pushd output/<release>/<platform-name>
tar xvf benchmarks-<platform-name>

4. Use multiple --process flags to generate a batch of bar plots for multiple platform results:

./holohub run release_benchmarking --no-local-build --run-args "\
    --process benchmarks/release_benchmarking/2.4/x86_64 \
    --process benchmarks/release_benchmarking/2.4/IGX_iGPU \
    --process benchmarks/release/benchmarking/2.4/IGX_dGPU"

Presenting Data#

You can use the template markdown file in the template folder to generate a markdown or PDF report with benchmark data with pandoc and Jinja2.

Copy and edit template/release.json with information about the benchmarking configuration, including the release version, platform configurations, and local paths to processed data. Run ./holohub run to print JSON-formatted platform details to the console about the current system:
```
./holohub run-container --img holohub:release_benchmarking --no-docker-build
./holohub run release_benchmarking --no-local-build --run-args="--print"
```

Render the document with the Jinja CLI tool:

pushd benchmarks/release_benchmarking
jinja2 template/results.md.tmpl template/<release-version>.json --format=json > output/<release-version>.md

(Optional) Generating a PDF report document#

You can convert the report to PDF format as an easy way to share your report as a single file with embedded plots.

In your copy of template/release.json, update the "format" string to "pdf".
Follow the instructions above to generate your markdown report with Jinja2.

Use pandoc to convert the markdown file to PDF:

pushd output
pandoc <release-version>.md -o <release-version>.pdf --toc

(Optional) Submitting Results to HoloHub#

The Holoscan SDK team may submit release benchmarking reports to HoloHub git history for general visibility. We use Markdown formatting to make plot diagrams accessible for direct download.

Move <release-version>.md and accompanying plots to a new release/<version> folder.
Update image paths in <release-version.md> and verify locally with a markdown renderer such as VS Code.
Commit changes, push to GitHub, and open a Pull Request.

Cleanup#

Benchmarking changes to application YAML files can be discarded after benchmarks complete.

git checkout applications/*.yaml

Troubleshooting#

Why am I seeing high end-to-end latency spikes as outliers in my data?

Latency spikes may occur in display-driven benchmarking if the display goes to sleep. Please configure your display settings to prevent the display from going to sleep before running benchmarks.

We have also infrequently observed latency spikes in cases where display drivers and CUDA Toolkit versions are not matched, and due to suboptimal GPU task preemption policies. We are still investigating these issues.

Benchmark applications are failing silently without writing log files.

Silent failures may indicate an issue with the underlying applications undergoing benchmarking. Try running the applications directly and verify execution is as expected: - ./holohub run endoscopy_tool_tracking --language=cpp - ./holohub run multiai_ultrasound --language=cpp

In some cases you may need to clear your HoloHub build or data folders to address errors: - ./holohub clear-cache - rm -rf ./data

Developer References#

While this tutorial is tailored to curated configurations of the Endoscopy Tool Tracking and Multi-AI Ultrasound HoloHub applications, developers utilize underlying Holoscan data frame flow tracking tools to similarly measure and analyze performance in custom Holoscan applications.

Refer to the Holoscan Flow Benchmarking project for general Holoscan performance profiling tools for both C++ and Python applications.
Refer to the Holoscan Flow Benchmarking whitepaper and tutorial for a comprehensive overview of pipeline profiling tools.
Refer to run_benchmarks.sh for additional examples demonstrating performance data collection and reporting with Holoscan Flow Tracking scripts.

Release Benchmarking Guide#