Holoscan is the library for building multi-modal, multi-rate processing pipelines with accelerated I/O and TensorRT-powered inference in Python or C++
Download (deb) Download (PyPI) Reference Applications
Implement operators in Python or C++, compose them in a graph, and iterate quickly with clear APIs.
Define dataflow graphs and let the runtime schedule work across operators — no manual threading or locks.
Multi-threaded single-process multi-threaded model and efficient message passing so you fit real-time pipelines on edge devices.
Isolate GPU work per operator for better resource control and to avoid interference between pipeline stages.
Built-in tools to trace execution, log events, and analyze timing so you can tune and debug pipelines.
Run entire pipelines on the GPU with CUDA graphs and streaming; minimize CPU involvement for predictable latency.
Ingest high-bandwidth sensor and network streams with HSB and Ethernet support, ready for real-time processing.
Reference implementations and examples using industry frameworks for low-latency, CPU-offloaded I/O.
Architected to handle extreme throughput so your pipelines keep up with multi-Gbps sensors and feeds.
$ pip install holoscan-cu12
$ python
Python 3.12.3
>>> from holoscan.core import Application
>>> app = Application()
>>> app.run()
█
Holoscan offers both Python and C++ packages, as well as NGC containers, Conda packages, and Yocto recipes.
The full installation guide (NGC containers, Conda, platform-specific steps) is available in the SDK Installation docs.
Operators — the nodes in your compute graph — support any number of input or output ports. Define them in your application or package them into reusable libraries.
Call into CuPy, MATX, or any other accelerated libraries. Holoscan uses the industry-standard DLPack format for tensors.
class ResampleTwoThirdsOp(Operator):
def setup(self, spec: OperatorSpec):
spec.input("in")
spec.output("out")
def compute(self, op_input, op_output, context):
sig = op_input.receive("in")
resample_sig = cusignal.resample_poly(sig, 2, 3)
op_output.emit(resample_sig, "out")
Browse our catalog, of built-in operators for I/O, inference, and viz, including TensorRT-based inference, EtherCAT motor control, and GeForce Now streaming server and client operators.
Plus, Holoscan plays well with others: bridge to ROS2, GStreamer, WebRTC (server and client), and DDS.
Create instances of your operators and any resources you need, and wire them together in your application. Each port on an operator has conditions that let the scheduler determine when to run the operator’s compute() method.
Holoscan provides both single-threaded and multi-threaded schedulers that evaluate your graph and schedule operations. Built-in resource management lets your operators use allocators, clocks, and other shared resources for memory and execution control.
from holoscan.core import Application
from holoscan.operators import FormatConverterOp, HolovizOp, V4L2VideoCaptureOp
from holoscan.resources import RMMAllocator
class WebcamViewer(Application):
def compose(self):
source = V4L2VideoCaptureOp(self, pass_through=True)
fmt = FormatConverterOp(self, in_dtype="yuyv", out_dtype="rgb888", pool=RMMAllocator(self))
viz = HolovizOp(self)
self.add_flow(source, fmt)
self.add_flow(fmt, viz, {("tensor", "receivers")})
app = WebcamViewer()
app.run()
Announcing Holoscan SDK 4.0
Build dynamic, distributed pub/sub apps, and GPU-resident graphs for real-time physical AI systems
Building an SMPTE ST 2110 Video Ingest Operator for Holoscan
A walkthrough of implementing a GPU-resident SMPTE ST 2110 operator for NVIDIA Holoscan, enabling ultra-low-latency uncompressed video ingest for real-time AI pipelines on edge systems like Jetson AGX Thor.