Skip to content

Real-Time End-to-end AI Surgical Video Workflow

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: March 18, 2025
Language: Python
Latest version: 1.0
Minimum Holoscan SDK version: 3.0.0
Tested Holoscan SDK versions: 3.0.0
Contribution metric: Level 1 - Highly Reliable

Overall Diagram Fig.1: The overall diagram illustrating the end-to-end pipeline for real-time AI surgical video processing. This workflow achieves an end-to-end latency of 37ms on average with a maximum of 54ms. The latency breakdown shown below the diagram indicates HSB latency (Avg=21ms, Max=28ms) and AI Application latency (Avg=16ms, Max=26ms), demonstrating the high-performance capabilities of this solution.

Overview

This reference application offers developers a modular, end-to-end pipeline that spans the entire sensor processing workflow—from sensor data ingestion and accelerated computing to AI inference, real-time visualization, and data stream output.

Specifically, we demonstrate a comprehensive real-time end-to-end AI surgical video pipeline that includes:

  1. Sensor I/O: Integration with Holoscan Sensor Bridge, enabling GPU Direct data ingestion for ultra low-latency input of surgical video feeds.
  2. Out-of-body detection to determine if the endoscope is inside or outside the patient's body, ensuring patient privacy by removing identifiable information.
  3. Dynamic flow condition based on out-of-body detection results.
  4. De-identification: pixelate the image to anonymize outside of body elements like people's faces.
  5. Multi-AI: Enabling simultaneous execution of multiple models at inference. Surgical Tool processing with:
  6. SSD detection for surgical tool detection
  7. MONAI segmentation for endoscopic tool segmentation

Architecture

RAISVP-workflow Fig.2: The workflow diagram representing all the holoscan operators (in green) and holoscan sensor bridge operators (in yellow). The source can be a Holoscan Sensor Bridge, an AJA Card or a video replayer.

Sample Output Images Fig.3: Endoscopy image from a partial nephrectomy procedure (surgical removal of the diseased portion of the kidney) showing AI tool segmentation results when the camera is inside the body and a deidentified (pixelated) output image when the camera is outside of the body.

1. Out-of-Body Detection

The workflow first determines if the endoscope is inside or outside the patient's body using an AI model.

2. Dynamic Flow Control

  • If outside the body: The video is deidentified through pixelation to protect privacy
  • If inside the body: The video is processed by the multi-AI pipeline

3. Multi-AI Processing

When inside the body, two AI models run concurrently:

  • SSD detection model identifies surgical tools with bounding boxes
  • MONAI segmentation model provides pixel-level segmentation of tools

4. Visualization

The HolovizOp displays the processed video with overlaid AI results, including:

  • Bounding boxes around detected tools
  • Segmentation masks for tools
  • Text labels for detected tools

Requirements

Software

  • Holoscan SDK v3.0: Holohub command takes care of this dependency when using Holohub container. However, you can install the Holoscan SDK via one of the methods specified in the SDK user guide.
  • Holoscan Sensor Bridge v2.0: Please see the Quick start guide for building the Holoscan Sensor Bridge docker container.

Models

This workflow utilizes the following three AI models:

Model Description File
📦️ Out-of-body Detection Model Detects if endoscope is inside or outside the body anonymization_model.onnx
📦️ SSD Detection for Endoscopy Surgical Tools Detects surgical tools with bounding boxes epoch24_nms.onnx
📦️ MONAI Endoscopic Tool Segmentation Provides pixel-level segmentation of tools model_endoscopic_tool_seg_sanitized_nhwc_in_nchw_out.onnx

Sample Data

Note: The directory specified by --data at runtime is assumed to contain three subdirectories, corresponding to the NGC resources specified in Models and Sample Data: orsi, monai_tool_seg_model and ssd_model. These resources will be automatically downloaded to the Holohub data directory when building the application.

Quick Start Guide

Using AJA Card or Replayer as I/O

./dev_container build_and_run ai_surgical_video

Using Holoscan Sensor Bridge as I/O

When using the workflow with --source hsb, it requires the Holoscan Sensor Bridge software to be installed. You can build a Holoscan Sensor Bridge container using the following commands:

git clone https://github.com/nvidia-holoscan/holoscan-sensor-bridge.git
cd holoscan-sensor-bridge
git checkout hsdk-3.0
./docker/build.sh

This will build a docker image called hololink-demo:2.0.0.

Once you have built the Holoscan Sensor Bridge container, you can build the Holohub container using the following command:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video

Advanced Usage

Building the Application

First, you need to run the Holohub container:

./dev_container launch --img holohub:link 

Then, you can build the workflow using the following command:

./run build ai_surgical_video

Running the Application

Use Holohub Container from Outside of the Container

Using the Holohub container, you can run the workflow without building it again:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link --no_build ai_surgical_video

However, if you want to build the workflow, you can just remove the --no_build flag:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video

Use Holohub Container from Inside the Container

First, you need to run the Holohub container:

./dev_container launch --img holohub:link 

To run the Python application, you can make use of the run script:

./run launch ai_surgical_video

Alternatively, you can run the application directly:

cd <HOLOHUB_SOURCE_DIR>/workflows/ai_surgical_video/python
python3 ai_surgical_video.py --source hsb --data <DATA_DIR> --config <CONFIG_FILE>

TIP: You can get the exact "Run command" along with "Run environment" and "Run workdir" by executing:

./run launch ai_surgical_video --dryrun

Command Line Arguments

The application accepts the following command line arguments:

Argument Description Default
-s, --source Source of video input: replayer, aja, or hsb replayer
-c, --config Path to a custom configuration file config.yaml in the application directory
-d, --data Path to the data directory containing model and video files Uses the HOLOHUB_DATA_PATH environment variable
--headless Run in headless mode (no visualization) False
--fullscreen Run in fullscreen mode False
--camera-mode Camera mode to use [0,1,2,3] 0
--frame-limit Exit after receiving this many frames No limit
--hololink IP address of Hololink board 192.168.0.2
--log-level Logging level to display 20
--ibv-name IBV device to use First available InfiniBand device
--ibv-port Port number of IBV device 1
--expander-configuration I2C Expander configuration (0 or 1) 0
--pattern Configure to display a test pattern (0-11) None
--ptp-sync After reset, wait for PTP time to synchronize False
--skip-reset Don't call reset on the hololink device False

Benchmarking

Please refer to Holoscan Benchmarking for how to perform benchmarking for this workflow.