Real-Time End-to-end AI Surgical Video Workflow #

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: Python
Last modified: May 13, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 3.0.0
Tested Holoscan SDK versions: 3.1.0
Contribution metric: Level 1 - Highly Reliable

Fig.1: The overall diagram illustrating the end-to-end pipeline for real-time AI surgical video processing. The pipeline achieves an average end-to-end latency of 37ms (maximum 54ms). Key latency components are shown: Holoscan Sensor Bridge (HSB) latency averages 21ms (max 28ms), and the AI application averages 16ms (median 17ms, 95th percentile 18ms, 99th percentile 22ms, max 26ms). These results demonstrate the solution's high-performance, low-latency capabilities for demanding surgical video applications.

Overview#

This reference application offers developers a modular, end-to-end pipeline that spans the entire sensor processing workflow—from sensor data ingestion and accelerated computing to AI inference, real-time visualization, and data stream output.

Specifically, we demonstrate a comprehensive real-time end-to-end AI surgical video pipeline that includes:

Sensor I/O: Integration with Holoscan Sensor Bridge, enabling GPU Direct data ingestion for ultra low-latency input of surgical video feeds.
Out-of-body detection to determine if the endoscope is inside or outside the patient's body, ensuring patient privacy by removing identifiable information.
Dynamic flow condition based on out-of-body detection results.
De-identification: pixelate the image to anonymize outside of body elements like people's faces.
Multi-AI: Enabling simultaneous execution of multiple models at inference. Surgical Tool processing with:
SSD detection for surgical tool detection
MONAI segmentation for endoscopic tool segmentation

Architecture#

RAISVP-workflow Fig.2: The workflow diagram representing all the holoscan operators (in green) and holoscan sensor bridge operators (in yellow). The source can be a Holoscan Sensor Bridge, an AJA Card or a video replayer.

Sample Output Images Fig.3: Endoscopy image from a partial nephrectomy procedure (surgical removal of the diseased portion of the kidney) showing AI tool segmentation results when the camera is inside the body and a deidentified (pixelated) output image when the camera is outside of the body.

1. Out-of-Body Detection#

The workflow first determines if the endoscope is inside or outside the patient's body using an AI model.

2. Dynamic Flow Control#

If outside the body: The video is deidentified through pixelation to protect privacy
If inside the body: The video is processed by the multi-AI pipeline

3. Multi-AI Processing#

When inside the body, two AI models run concurrently:

SSD detection model identifies surgical tools with bounding boxes
MONAI segmentation model provides pixel-level segmentation of tools

4. Visualization#

The HolovizOp displays the processed video with overlaid AI results, including:

Bounding boxes around detected tools
Segmentation masks for tools
Text labels for detected tools

Requirements#

Software#

Holoscan SDK v3.0: Holohub command takes care of this dependency when using Holohub container. However, you can install the Holoscan SDK via one of the methods specified in the SDK user guide.
Holoscan Sensor Bridge v2.0: Please see the Quick start guide for building the Holoscan Sensor Bridge docker container.

Models#

This workflow utilizes the following three AI models:

Model	Description	File
📦️ Out-of-body Detection Model	Detects if endoscope is inside or outside the body	`anonymization_model.onnx`
📦️ SSD Detection for Endoscopy Surgical Tools	Detects surgical tools with bounding boxes	`epoch24_nms.onnx`
📦️ MONAI Endoscopic Tool Segmentation	Provides pixel-level segmentation of tools	`model_endoscopic_tool_seg_sanitized_nhwc_in_nchw_out.onnx`

Sample Data#

📦️ Orsi partial nephrectomy procedures - Sample endoscopy video data for use with the replayer source

Note: The directory specified by --data at runtime is assumed to contain three subdirectories, corresponding to the NGC resources specified in Models and Sample Data: orsi, monai_tool_seg_model and ssd_model. These resources will be automatically downloaded to the Holohub data directory when building the application.

Quick Start Guide#

Using AJA Card or Replayer as I/O#

./dev_container build_and_run ai_surgical_video

Using Holoscan Sensor Bridge as I/O#

When using the workflow with --source hsb, it requires the Holoscan Sensor Bridge software to be installed. You can build a Holoscan Sensor Bridge container using the following commands:

git clone https://github.com/nvidia-holoscan/holoscan-sensor-bridge.git
cd holoscan-sensor-bridge
git checkout hsdk-3.0
./docker/build.sh

This will build a docker image called hololink-demo:2.0.0.

Once you have built the Holoscan Sensor Bridge container, you can build the Holohub container and run the workflow using the following command:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video --run_args " --source hsb"

Advanced Usage#

Building the Application#

First, you need to run the Holohub container:

./dev_container launch --img holohub:link

Then, you can build the workflow using the following command:

./run build ai_surgical_video

Running the Application#

Use Holohub Container from Outside of the Container#

Using the Holohub container, you can run the workflow without building it again:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link --no_build ai_surgical_video

However, if you want to build the workflow, you can just remove the --no_build flag:

./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video

Use Holohub Container from Inside the Container#