TapNext Inference Operator#
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: C++, Python
Last modified: March 13, 2026
Latest version: 1.0
Minimum Holoscan SDK version: 3.8.0
Tested Holoscan SDK versions: 3.8.0
Contribution metric: Level 1 - Highly Reliable
The tapnext_inference operator runs TensorRT inference for TapNext-style architectures,
where an initialization model processes the first frame and a forward model
processes all subsequent frames while maintaining internal state between steps. It is
designed for dense point tracking across video sequences.
This operator wraps the
TapNextInference GXF extension.
holoscan::ops::TapNextInferenceOp#
Operator class to perform TapNext inference (C++ and Python).
How It Works#
Each call receives one or more messages containing at least a step tensor (int32,
device memory) and a video frame:
- Step 0 (Init) -- The Init TensorRT engine runs. Its outputs include state tensors that are copied into internal storage for the next step.
- Step > 0 (Forward) -- The Forward TensorRT engine runs. State tensors from the previous step are fed as inputs, and updated state tensors are written back to internal storage after inference.
Query points are generated once at startup as an evenly-spaced grid and bound as the
query_points input on every tick.
Supported Model Formats#
- ONNX (
.onnx) -- Automatically converted to a TensorRT engine on first use and cached inengine_cache_dir. Conversion can take several minutes. - Pre-built TensorRT engines (
.engine/.plan) -- Used directly, skipping conversion.
Parameters#
model_file_path_init: Path to ONNX (or engine) model for initialization.- type:
std::string model_file_path_fwd: Path to ONNX (or engine) model for forward tracking.- type:
std::string engine_cache_dir: Directory for cached TensorRT engine files.- type:
std::string plugins_lib_namespace: TensorRT plugins library namespace.- type:
std::string - default:
"" force_engine_update: Force rebuild of TensorRT engines even if a cached engine exists.- type:
bool - default:
false input_tensor_names_init: Input tensor names for the Init model.- type:
std::vector<std::string> input_binding_names_init: Corresponding TensorRT binding names for Init inputs.- type:
std::vector<std::string> output_tensor_names_init: Output tensor names for the Init model.- type:
std::vector<std::string> output_binding_names_init: Corresponding TensorRT binding names for Init outputs.- type:
std::vector<std::string> input_tensor_names_fwd: Input tensor names for the Forward model.- type:
std::vector<std::string> input_binding_names_fwd: Corresponding TensorRT binding names for Forward inputs.- type:
std::vector<std::string> output_tensor_names_fwd: Output tensor names for the Forward model.- type:
std::vector<std::string> output_binding_names_fwd: Corresponding TensorRT binding names for Forward outputs.- type:
std::vector<std::string> state_tensor_names: Tensor names treated as internal state (preserved across steps).- type:
std::vector<std::string> pool: Allocator instance for device tensor memory.- type:
std::shared_ptr<Allocator> cuda_stream_pool: CUDA Stream Pool for asynchronous execution.- type:
std::shared_ptr<CudaStreamPool> max_workspace_size: TensorRT builder max workspace size in bytes.- type:
int64_t - default:
67108864(64 MB) max_batch_size: Max batch size for TensorRT optimization profiles.- type:
int32_t - default:
1 enable_fp16: Enable FP16 precision (ignored on TensorRT >= 10.13).- type:
bool - default:
false relaxed_dimension_check: Pad input rank with leading 1s when it is smaller than the binding rank.- type:
bool - default:
true verbose: Enable verbose TensorRT and operator logging.- type:
bool - default:
false grid_size: Grid dimension N for query point generation (N x N points).- type:
int32_t - default:
15 grid_height: Image height used for query point grid spacing.- type:
int32_t - default:
256 grid_width: Image width used for query point grid spacing.- type:
int32_t - default:
256
Inputs#
receivers(gxf::Entity) -- One or more input messages containing:step(int32tensor, device) --0selects the Init model, any other value selects the Forward model.frame/video(Tensor, device) -- The video frame to process.- Any additional tensors matching configured
input_tensor_names_*entries.
Outputs#
transmitter(gxf::Entity) -- Output message containing:- All tensors listed in
output_tensor_names_*for the selected model. step-- Passed through from the input.- State tensors are also emitted as outputs but are additionally copied back into internal storage for the next tick.
Python API#
The operator is available as TapNextInferenceOp via the Python bindings:
from holohub.tapnext_inference import TapNextInferenceOp
The constructor accepts the same parameters as keyword arguments:
tapnext = TapNextInferenceOp(
self,
model_file_path_init="/path/to/init.onnx",
model_file_path_fwd="/path/to/fwd.onnx",
engine_cache_dir="/tmp/engines",
input_tensor_names_init=["video", "query_points"],
input_binding_names_init=["video", "query_points"],
output_tensor_names_init=["tracks", "visible", "state"],
output_binding_names_init=["tracks", "visible", "state"],
input_tensor_names_fwd=["video", "query_points", "state"],
input_binding_names_fwd=["video", "query_points", "state"],
output_tensor_names_fwd=["tracks", "visible", "state"],
output_binding_names_fwd=["tracks", "visible", "state"],
state_tensor_names=["state"],
pool=allocator,
cuda_stream_pool=cuda_stream_pool,
name="tapnext_inference",
)