Real-Time End-to-end AI Surgical Video Workflow#
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: Python
Last modified: May 13, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 3.0.0
Tested Holoscan SDK versions: 3.1.0
Contribution metric: Level 1 - Highly Reliable
Fig.1: The overall diagram illustrating the end-to-end pipeline for real-time AI surgical video processing. The pipeline achieves an average end-to-end latency of 37ms (maximum 54ms). Key latency components are shown: Holoscan Sensor Bridge (HSB) latency averages 21ms (max 28ms), and the AI application averages 16ms (median 17ms, 95th percentile 18ms, 99th percentile 22ms, max 26ms). These results demonstrate the solution's high-performance, low-latency capabilities for demanding surgical video applications.
Overview#
This reference application offers developers a modular, end-to-end pipeline that spans the entire sensor processing workflow—from sensor data ingestion and accelerated computing to AI inference, real-time visualization, and data stream output.
Specifically, we demonstrate a comprehensive real-time end-to-end AI surgical video pipeline that includes:
- Sensor I/O: Integration with Holoscan Sensor Bridge, enabling GPU Direct data ingestion for ultra low-latency input of surgical video feeds.
- Out-of-body detection to determine if the endoscope is inside or outside the patient's body, ensuring patient privacy by removing identifiable information.
- Dynamic flow condition based on out-of-body detection results.
- De-identification: pixelate the image to anonymize outside of body elements like people's faces.
- Multi-AI: Enabling simultaneous execution of multiple models at inference. Surgical Tool processing with:
- SSD detection for surgical tool detection
- MONAI segmentation for endoscopic tool segmentation
Architecture#
Fig.2: The workflow diagram representing all the holoscan operators (in green) and holoscan sensor bridge operators (in yellow). The source can be a Holoscan Sensor Bridge, an AJA Card or a video replayer.
Fig.3: Endoscopy image from a partial nephrectomy procedure (surgical removal of the diseased portion of the kidney) showing AI tool segmentation results when the camera is inside the body and a deidentified (pixelated) output image when the camera is outside of the body.
1. Out-of-Body Detection#
The workflow first determines if the endoscope is inside or outside the patient's body using an AI model.
2. Dynamic Flow Control#
- If outside the body: The video is deidentified through pixelation to protect privacy
- If inside the body: The video is processed by the multi-AI pipeline
3. Multi-AI Processing#
When inside the body, two AI models run concurrently:
- SSD detection model identifies surgical tools with bounding boxes
- MONAI segmentation model provides pixel-level segmentation of tools
4. Visualization#
The HolovizOp displays the processed video with overlaid AI results, including:
- Bounding boxes around detected tools
- Segmentation masks for tools
- Text labels for detected tools
Requirements#
Software#
- Holoscan SDK
v3.0
: Holohub command takes care of this dependency when using Holohub container. However, you can install the Holoscan SDK via one of the methods specified in the SDK user guide. - Holoscan Sensor Bridge
v2.0
: Please see the Quick start guide for building the Holoscan Sensor Bridge docker container.
Models#
This workflow utilizes the following three AI models:
Model | Description | File |
---|---|---|
📦️ Out-of-body Detection Model | Detects if endoscope is inside or outside the body | anonymization_model.onnx |
📦️ SSD Detection for Endoscopy Surgical Tools | Detects surgical tools with bounding boxes | epoch24_nms.onnx |
📦️ MONAI Endoscopic Tool Segmentation | Provides pixel-level segmentation of tools | model_endoscopic_tool_seg_sanitized_nhwc_in_nchw_out.onnx |
Sample Data#
- 📦️ Orsi partial nephrectomy procedures - Sample endoscopy video data for use with the
replayer
source
Note: The directory specified by
--data
at runtime is assumed to contain three subdirectories, corresponding to the NGC resources specified in Models and Sample Data:orsi
,monai_tool_seg_model
andssd_model
. These resources will be automatically downloaded to the Holohub data directory when building the application.
Quick Start Guide#
Using AJA Card or Replayer as I/O#
./dev_container build_and_run ai_surgical_video
Using Holoscan Sensor Bridge as I/O#
When using the workflow with --source hsb
, it requires the Holoscan Sensor Bridge software to be installed. You can build a Holoscan Sensor Bridge container using the following commands:
git clone https://github.com/nvidia-holoscan/holoscan-sensor-bridge.git
cd holoscan-sensor-bridge
git checkout hsdk-3.0
./docker/build.sh
This will build a docker image called hololink-demo:2.0.0
.
Once you have built the Holoscan Sensor Bridge container, you can build the Holohub container and run the workflow using the following command:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video --run_args " --source hsb"
Advanced Usage#
Building the Application#
First, you need to run the Holohub container:
./dev_container launch --img holohub:link
Then, you can build the workflow using the following command:
./run build ai_surgical_video
Running the Application#
Use Holohub Container from Outside of the Container#
Using the Holohub container, you can run the workflow without building it again:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link --no_build ai_surgical_video
However, if you want to build the workflow, you can just remove the --no_build
flag:
./dev_container build_and_run --base_img hololink-demo:2.0.0 --img holohub:link ai_surgical_video
Use Holohub Container from Inside the Container#
First, you need to run the Holohub container:
./dev_container launch --img holohub:link
To run the Python application, you can make use of the run script:
./run launch ai_surgical_video
Alternatively, you can run the application directly:
cd <HOLOHUB_SOURCE_DIR>/workflows/ai_surgical_video/python
python3 ai_surgical_video.py --source hsb --data <DATA_DIR> --config <CONFIG_FILE>
TIP: You can get the exact "Run command" along with "Run environment" and "Run workdir" by executing:
./run launch ai_surgical_video --dryrun
Command Line Arguments#
The application accepts the following command line arguments:
Argument | Description | Default |
---|---|---|
-s, --source |
Source of video input: replayer , aja , or hsb |
replayer |
-c, --config |
Path to a custom configuration file | config.yaml in the application directory |
-d, --data |
Path to the data directory containing model and video files | Uses the HOLOHUB_DATA_PATH environment variable |
--headless |
Run in headless mode (no visualization) | False |
--fullscreen |
Run in fullscreen mode | False |
--camera-mode |
Camera mode to use [0,1,2,3] | 0 |
--frame-limit |
Exit after receiving this many frames | No limit |
--hololink |
IP address of Hololink board | 192.168.0.2 |
--log-level |
Logging level to display | 20 |
--ibv-name |
IBV device to use | First available InfiniBand device |
--ibv-port |
Port number of IBV device | 1 |
--expander-configuration |
I2C Expander configuration (0 or 1) | 0 |
--pattern |
Configure to display a test pattern (0-11) | None |
--ptp-sync |
After reset, wait for PTP time to synchronize | False |
--skip-reset |
Don't call reset on the hololink device | False |
Benchmarking#
Please refer to Holoscan Benchmarking for how to perform benchmarking for this workflow.