Custom LSTM Inference#
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: C++, Python
Last modified: October 9, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 0.5.0
Tested Holoscan SDK versions: 0.5.0
Contribution metric: Level 1 - Highly Reliable
The lstm_tensor_rt_inference extension provides LSTM (Long-Short Term Memory) stateful inference module using TensorRT.
nvidia::holoscan::lstm_tensor_rt_inference::TensorRtInference#
Codelet, taking input tensors and feeding them into TensorRT for LSTM inference.
This implementation is based on nvidia::gxf::TensorRtInference.
input_state_tensor_names and output_state_tensor_names parameters are added to specify tensor names for states in LSTM model.
Parameters#
model_file_path: Path to ONNX model to be loaded- type:
std::string engine_cache_dir: Path to a directory containing cached generated engines to be serialized and loaded from- type:
std::string plugins_lib_namespace: Namespace used to register all the plugins in this library (default:"")- type:
std::string force_engine_update: Always update engine regard less of existing engine file. Such conversion may take minutes (default:false)- type:
bool input_tensor_names: Names of input tensors in the order to be fed into the model- type:
std::vector<std::string> input_state_tensor_names: Names of input state tensors that are used internally by TensorRT- type:
std::vector<std::string> input_binding_names: Names of input bindings as in the model in the same order of what is provided in input_tensor_names- type:
std::vector<std::string> output_tensor_names: Names of output tensors in the order to be retrieved from the model- type:
std::vector<std::string> input_state_tensor_names: Names of output state tensors that are used internally by TensorRT- type:
std::vector<std::string> output_binding_names: Names of output bindings in the model in the same order of of what is provided in output_tensor_names- type:
std::vector<std::string> pool: Allocator instance for output tensors- type:
gxf::Handle<gxf::Allocator> cuda_stream_pool: Instance of gxf::CudaStreamPool to allocate CUDA stream- type:
gxf::Handle<gxf::CudaStreamPool> max_workspace_size: Size of working space in bytes (default:67108864l(64MB))- type:
int64_t dla_core: DLA Core to use. Fallback to GPU is always enabled. Default to use GPU only (optional)- type:
int32_t max_batch_size: Maximum possible batch size in case the first dimension is dynamic and used as batch size (default:1)- type:
int32_t enable_fp16_: Enable inference with FP16 and FP32 fallback (default:false)- type:
bool verbose: Enable verbose logging on console (default:false)- type:
bool relaxed_dimension_check: Ignore dimensions of 1 for input tensor dimension check (default:true)- type:
bool rx: List of receivers to take input tensors- type:
std::vector<gxf::Handle<gxf::Receiver>> tx: Transmitter to publish output tensors- type:
gxf::Handle<gxf::Transmitter>