TapNext Inference Operator #

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: C++, Python
Last modified: March 13, 2026
Latest version: 1.0
Minimum Holoscan SDK version: 3.8.0
Tested Holoscan SDK versions: 3.8.0
Contribution metric: Level 1 - Highly Reliable

The tapnext_inference operator runs TensorRT inference for TapNext-style architectures, where an initialization model processes the first frame and a forward model processes all subsequent frames while maintaining internal state between steps. It is designed for dense point tracking across video sequences.

This operator wraps the TapNextInference GXF extension.

`holoscan::ops::TapNextInferenceOp`#

Operator class to perform TapNext inference (C++ and Python).

How It Works#

Each call receives one or more messages containing at least a step tensor (int32, device memory) and a video frame:

Step 0 (Init) -- The Init TensorRT engine runs. Its outputs include state tensors that are copied into internal storage for the next step.
Step > 0 (Forward) -- The Forward TensorRT engine runs. State tensors from the previous step are fed as inputs, and updated state tensors are written back to internal storage after inference.

Query points are generated once at startup as an evenly-spaced grid and bound as the query_points input on every tick.

Supported Model Formats#

ONNX (.onnx) -- Automatically converted to a TensorRT engine on first use and cached in engine_cache_dir. Conversion can take several minutes.
Pre-built TensorRT engines (.engine / .plan) -- Used directly, skipping conversion.

Parameters#

model_file_path_init: Path to ONNX (or engine) model for initialization.
type: std::string
model_file_path_fwd: Path to ONNX (or engine) model for forward tracking.
type: std::string
engine_cache_dir: Directory for cached TensorRT engine files.
type: std::string
plugins_lib_namespace: TensorRT plugins library namespace.
type: std::string
default: ""
force_engine_update: Force rebuild of TensorRT engines even if a cached engine exists.
type: bool
default: false
input_tensor_names_init: Input tensor names for the Init model.
type: std::vector<std::string>
input_binding_names_init: Corresponding TensorRT binding names for Init inputs.
type: std::vector<std::string>
output_tensor_names_init: Output tensor names for the Init model.
type: std::vector<std::string>
output_binding_names_init: Corresponding TensorRT binding names for Init outputs.
type: std::vector<std::string>
input_tensor_names_fwd: Input tensor names for the Forward model.
type: std::vector<std::string>
input_binding_names_fwd: Corresponding TensorRT binding names for Forward inputs.
type: std::vector<std::string>
output_tensor_names_fwd: Output tensor names for the Forward model.
type: std::vector<std::string>
output_binding_names_fwd: Corresponding TensorRT binding names for Forward outputs.
type: std::vector<std::string>
state_tensor_names: Tensor names treated as internal state (preserved across steps).
type: std::vector<std::string>
pool: Allocator instance for device tensor memory.
type: std::shared_ptr<Allocator>
cuda_stream_pool: CUDA Stream Pool for asynchronous execution.
type: std::shared_ptr<CudaStreamPool>
max_workspace_size: TensorRT builder max workspace size in bytes.
type: int64_t
default: 67108864 (64 MB)
max_batch_size: Max batch size for TensorRT optimization profiles.
type: int32_t
default: 1
enable_fp16: Enable FP16 precision (ignored on TensorRT >= 10.13).
type: bool
default: false
relaxed_dimension_check: Pad input rank with leading 1s when it is smaller than the binding rank.
type: bool
default: true
verbose: Enable verbose TensorRT and operator logging.
type: bool
default: false
grid_size: Grid dimension N for query point generation (N x N points).
type: int32_t
default: 15
grid_height: Image height used for query point grid spacing.
type: int32_t
default: 256
grid_width: Image width used for query point grid spacing.
type: int32_t
default: 256

Inputs#

receivers (gxf::Entity) -- One or more input messages containing:
step (int32 tensor, device) -- 0 selects the Init model, any other value selects the Forward model.
frame / video (Tensor, device) -- The video frame to process.
Any additional tensors matching configured input_tensor_names_* entries.

Outputs#

transmitter (gxf::Entity) -- Output message containing:
All tensors listed in output_tensor_names_* for the selected model.
step -- Passed through from the input.
State tensors are also emitted as outputs but are additionally copied back into internal storage for the next tick.

Python API#

The operator is available as TapNextInferenceOp via the Python bindings:

from holohub.tapnext_inference import TapNextInferenceOp

The constructor accepts the same parameters as keyword arguments:

tapnext = TapNextInferenceOp(
    self,
    model_file_path_init="/path/to/init.onnx",
    model_file_path_fwd="/path/to/fwd.onnx",
    engine_cache_dir="/tmp/engines",
    input_tensor_names_init=["video", "query_points"],
    input_binding_names_init=["video", "query_points"],
    output_tensor_names_init=["tracks", "visible", "state"],
    output_binding_names_init=["tracks", "visible", "state"],
    input_tensor_names_fwd=["video", "query_points", "state"],
    input_binding_names_fwd=["video", "query_points", "state"],
    output_tensor_names_fwd=["tracks", "visible", "state"],
    output_binding_names_fwd=["tracks", "visible", "state"],
    state_tensor_names=["state"],
    pool=allocator,
    cuda_stream_pool=cuda_stream_pool,
    name="tapnext_inference",
)

TapNext Inference Operator#

holoscan::ops::TapNextInferenceOp#