Skip to content

Custom LSTM Inference#

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: C++, Python
Last modified: October 9, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 0.5.0
Tested Holoscan SDK versions: 0.5.0
Contribution metric: Level 1 - Highly Reliable

The lstm_tensor_rt_inference extension provides LSTM (Long-Short Term Memory) stateful inference module using TensorRT.

nvidia::holoscan::lstm_tensor_rt_inference::TensorRtInference#

Codelet, taking input tensors and feeding them into TensorRT for LSTM inference.

This implementation is based on nvidia::gxf::TensorRtInference. input_state_tensor_names and output_state_tensor_names parameters are added to specify tensor names for states in LSTM model.

Parameters#

  • model_file_path: Path to ONNX model to be loaded
  • type: std::string
  • engine_cache_dir: Path to a directory containing cached generated engines to be serialized and loaded from
  • type: std::string
  • plugins_lib_namespace: Namespace used to register all the plugins in this library (default: "")
  • type: std::string
  • force_engine_update: Always update engine regard less of existing engine file. Such conversion may take minutes (default: false)
  • type: bool
  • input_tensor_names: Names of input tensors in the order to be fed into the model
  • type: std::vector<std::string>
  • input_state_tensor_names: Names of input state tensors that are used internally by TensorRT
  • type: std::vector<std::string>
  • input_binding_names: Names of input bindings as in the model in the same order of what is provided in input_tensor_names
  • type: std::vector<std::string>
  • output_tensor_names: Names of output tensors in the order to be retrieved from the model
  • type: std::vector<std::string>
  • input_state_tensor_names: Names of output state tensors that are used internally by TensorRT
  • type: std::vector<std::string>
  • output_binding_names: Names of output bindings in the model in the same order of of what is provided in output_tensor_names
  • type: std::vector<std::string>
  • pool: Allocator instance for output tensors
  • type: gxf::Handle<gxf::Allocator>
  • cuda_stream_pool: Instance of gxf::CudaStreamPool to allocate CUDA stream
  • type: gxf::Handle<gxf::CudaStreamPool>
  • max_workspace_size: Size of working space in bytes (default: 67108864l (64MB))
  • type: int64_t
  • dla_core: DLA Core to use. Fallback to GPU is always enabled. Default to use GPU only (optional)
  • type: int32_t
  • max_batch_size: Maximum possible batch size in case the first dimension is dynamic and used as batch size (default: 1)
  • type: int32_t
  • enable_fp16_: Enable inference with FP16 and FP32 fallback (default: false)
  • type: bool
  • verbose: Enable verbose logging on console (default: false)
  • type: bool
  • relaxed_dimension_check: Ignore dimensions of 1 for input tensor dimension check (default: true)
  • type: bool
  • rx: List of receivers to take input tensors
  • type: std::vector<gxf::Handle<gxf::Receiver>>
  • tx: Transmitter to publish output tensors
  • type: gxf::Handle<gxf::Transmitter>

API Reference#

Python#

LSTMTensorRTInferenceOp#

Operator class to perform inference using an LSTM model.

Constructor Parameters#
Parameter Type Required Description
fragment Fragment Required The fragment that the operator belongs to.
input_tensor_names sequence of str Required Names of input tensors in the order to be fed into the model.
output_tensor_names sequence of str Required Names of output tensors in the order to be retrieved from the model.
input_binding_names sequence of str Required Names of input bindings as in the model in the same order of what is provided in input_tensor_names.
output_binding_names sequence of str Required Names of output bindings as in the model in the same order of what is provided in output_tensor_names.
model_file_path str Required Path to the ONNX model to be loaded.
engine_cache_dir str Required Path to a folder containing cached engine files to be serialized and loaded from.
pool holoscan.resources.Allocator Required Allocator instance for output tensors.
cuda_stream_pool holoscan.resources.CudaStreamPool Required CudaStreamPool instance to allocate CUDA streams.
plugins_lib_namespace str Required Namespace used to register all the plugins in this library.
input_state_tensor_names sequence of str Optional Names of input state tensors that are used internally by TensorRT.
output_state_tensor_names sequence of str Optional Names of output state tensors that are used internally by TensorRT.
force_engine_update bool Optional Always update engine regardless of whether there is an existing engine file.
Warning this may take minutes to complete, so is False by default. Required
enable_fp16 bool Optional Enable inference with FP16 and FP32 fallback.
verbose bool Optional Enable verbose logging to the console.
relaxed_dimension_check bool Optional Ignore dimensions of 1 for input tensor dimension check.
max_workspace_size int Optional Size of working space in bytes.
max_batch_size int Optional Maximum possible batch size in case the first dimension is dynamic and used as batch size.
name str Optional The name of the operator.
Methods#
  • gxf_typename: The GXF type name of the resource.
  • initialize: Initialize the operator.
  • setup: Define the operator specification.