Real-Time End-to-end AI Surgical Video Workflow¶
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: March 18, 2025
Language: Python
Latest version: 1.0
Minimum Holoscan SDK version: 3.0.0
Tested Holoscan SDK versions: 3.0.0
Contribution metric: Level 1 - Highly Reliable
Fig.1: The overall diagram illustrating the end-to-end pipeline for real-time AI surgical video processing. This workflow achieves an end-to-end latency of 37ms on average with a maximum of 54ms. The latency breakdown shown below the diagram indicates HSB latency (Avg=21ms, Max=28ms) and AI Application latency (Avg=16ms, Max=26ms), demonstrating the high-performance capabilities of this solution.
Overview¶
This reference application offers developers a modular, end-to-end pipeline that spans the entire sensor processing workflow—from sensor data ingestion and accelerated computing to AI inference, real-time visualization, and data stream output.
Specifically, we demonstrate a comprehensive real-time end-to-end AI surgical video pipeline that includes:
- Sensor I/O: Integration with Holoscan Sensor Bridge, enabling GPU Direct data ingestion for ultra low-latency input of surgical video feeds.
- Out-of-body detection to determine if the endoscope is inside or outside the patient's body, ensuring patient privacy by removing identifiable information.
- Dynamic flow condition based on out-of-body detection results.
- De-identification: pixelate the image to anonymize outside of body elements like people's faces.
- Multi-AI: Enabling simultaneous execution of multiple models at inference. Surgical Tool processing with:
- SSD detection for surgical tool detection
- MONAI segmentation for endoscopic tool segmentation
Architecture¶
Fig.2: The workflow diagram representing all the holoscan operators (in green) and holoscan sensor bridge operators (in yellow). The source can be a Holoscan Sensor Bridge, an AJA Card or a video replayer.
Fig.3: Endoscopy image from a partial nephrectomy procedure (surgical removal of the diseased portion of the kidney) showing AI tool segmentation results when the camera is inside the body and a deidentified (pixelated) output image when the camera is outside of the body.
1. Out-of-Body Detection¶
The workflow first determines if the endoscope is inside or outside the patient's body using an AI model.
2. Dynamic Flow Control¶
- If outside the body: The video is deidentified through pixelation to protect privacy
- If inside the body: The video is processed by the multi-AI pipeline
3. Multi-AI Processing¶
When inside the body, two AI models run concurrently:
- SSD detection model identifies surgical tools with bounding boxes
- MONAI segmentation model provides pixel-level segmentation of tools
4. Visualization¶
The HolovizOp displays the processed video with overlaid AI results, including:
- Bounding boxes around detected tools
- Segmentation masks for tools
- Text labels for detected tools
Requirements¶
Software¶
- Holoscan SDK
v3.0
: Holohub command takes care of this dependency when using Holohub container. However, you can install the Holoscan SDK via one of the methods specified in the SDK user guide. - Holoscan Sensor Bridge
v2.0
: Please see the Quick start guide for building the Holoscan Sensor Bridge docker container.
Models¶
This workflow utilizes the following three AI models:
Model | Description | File |
---|---|---|
📦️ Out-of-body Detection Model | Detects if endoscope is inside or outside the body | anonymization_model.onnx |
📦️ SSD Detection for Endoscopy Surgical Tools | Detects surgical tools with bounding boxes | epoch24_nms.onnx |
📦️ MONAI Endoscopic Tool Segmentation | Provides pixel-level segmentation of tools | model_endoscopic_tool_seg_sanitized_nhwc_in_nchw_out.onnx |
Sample Data¶
- 📦️ Orsi partial nephrectomy procedures - Sample endoscopy video data for use with the
replayer
source
Note: The directory specified by
--data
at runtime is assumed to contain three subdirectories, corresponding to the NGC resources specified in Models and Sample Data:orsi
,monai_tool_seg_model
andssd_model
. These resources will be automatically downloaded to the Holohub data directory when building the application.
Quick Start Guide¶
Using AJA Card or Replayer as I/O¶
./dev_container build_and_run ai_surgical_video
Using Holoscan Sensor Bridge as I/O¶
When using the workflow with --source hsb
, it requires the Holoscan Sensor Bridge software to be installed. You can build a Holoscan Sensor Bridge container using the following commands:
git clone https://github.com/nvidia-holoscan/holoscan-sensor-bridge.git
cd holoscan-sensor-bridge
git checkout hsdk-3.0
./docker/build.sh
This will build a docker image called hololink-demo:2.0.0
.
Once you have built the Holoscan Sensor Bridge container, you can build the Holohub container using the following command:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video
Advanced Usage¶
Building the Application¶
First, you need to run the Holohub container:
./dev_container launch --img holohub:link
Then, you can build the workflow using the following command:
./run build ai_surgical_video
Running the Application¶
Use Holohub Container from Outside of the Container¶
Using the Holohub container, you can run the workflow without building it again:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link --no_build ai_surgical_video
However, if you want to build the workflow, you can just remove the --no_build
flag:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video
Use Holohub Container from Inside the Container¶
First, you need to run the Holohub container:
./dev_container launch --img holohub:link
To run the Python application, you can make use of the run script:
./run launch ai_surgical_video
Alternatively, you can run the application directly:
cd <HOLOHUB_SOURCE_DIR>/workflows/ai_surgical_video/python
python3 ai_surgical_video.py --source hsb --data <DATA_DIR> --config <CONFIG_FILE>
TIP: You can get the exact "Run command" along with "Run environment" and "Run workdir" by executing:
./run launch ai_surgical_video --dryrun
Command Line Arguments¶
The application accepts the following command line arguments:
Argument | Description | Default |
---|---|---|
-s, --source |
Source of video input: replayer , aja , or hsb |
replayer |
-c, --config |
Path to a custom configuration file | config.yaml in the application directory |
-d, --data |
Path to the data directory containing model and video files | Uses the HOLOHUB_DATA_PATH environment variable |
--headless |
Run in headless mode (no visualization) | False |
--fullscreen |
Run in fullscreen mode | False |
--camera-mode |
Camera mode to use [0,1,2,3] | 0 |
--frame-limit |
Exit after receiving this many frames | No limit |
--hololink |
IP address of Hololink board | 192.168.0.2 |
--log-level |
Logging level to display | 20 |
--ibv-name |
IBV device to use | First available InfiniBand device |
--ibv-port |
Port number of IBV device | 1 |
--expander-configuration |
I2C Expander configuration (0 or 1) | 0 |
--pattern |
Configure to display a test pattern (0-11) | None |
--ptp-sync |
After reset, wait for PTP time to synchronize | False |
--skip-reset |
Don't call reset on the hololink device | False |
Benchmarking¶
Please refer to Holoscan Benchmarking for how to perform benchmarking for this workflow.