Surgical Scene Reconstruction with Gaussian Splatting#
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64
Language: Python
Last modified: November 21, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 3.7.0
Tested Holoscan SDK versions: 3.7.0
Contribution metric: Level 2 - Trusted
This application demonstrates real-time 3D surgical scene reconstruction by combining Holoscan SDK for high-performance streaming, 3D Gaussian Splatting for neural 3D representation, and temporal deformation networks for accurate modeling of dynamic tissue.

The application provides a complete end-to-end pipelineβfrom raw surgical video to real-time 3D reconstruction. Researchers and developers can use it to train custom models on their own endoscopic data and visualize results with GPU-accelerated rendering.
Features of this application include:
- Real-time Visualization: Stream surgical scene reconstruction at >30 FPS using Holoscan
- Temporal Deformation: Accurate per-frame tissue modeling as it deforms over time
- Tool Removal: Tissue-only reconstruction mode (surgical instruments automatically excluded)
- End-to-End Training: On-the-fly model training from streamed endoscopic data
- Two Operation Modes: Inference-only (with pre-trained checkpoint) OR train-then-render
- Production Ready: Tested and optimized Holoscan pipeline with complete Docker containerization
It takes input from EndoNeRF surgical datasets (RGB images + stereo depth + camera poses + tool masks). It processes the input using multi-frame Gaussian Splatting with a 4D spatiotemporal deformation network. And it outputs real-time 3D tissue reconstruction without surgical instruments.
It is ideal for use cases, such as:
- Surgical scene understanding and visualization
- Tool-free tissue reconstruction for analysis
- Research in surgical vision and 3D reconstruction
- Development of real-time surgical guidance systems
Quick Start#
Step 1: Clone the HoloHub Repository#
git clone https://github.com/nvidia-holoscan/holohub.git
cd holohub
Step 2: Read and Agree to the Terms and Conditions of the EndoNeRF Sample Dataset#
- Read and agree to the Terms and Conditions for the EndoNeRF dataset.
- EndoNeRF sample dataset is being downloaded automatically when building the application.
- Optionally, for manual download of the dataset, refer to the Data section below.
-
Optionally, if you do not agree to the terms and conditions, set the
HOLOHUB_DOWNLOAD_DATASETSenvironment variable toOFFand manually download the dataset and place it in the correct location by following the instructions in the Data section below.export HOLOHUB_DOWNLOAD_DATASETS=OFF
Step 3: Run Training#
To run the model training:
./holohub run surgical_scene_recon train
Step 4: Dynamic Rendering with a Trained Model#
After training completes, to visualize your results in real-time, run the surgical render:
./holohub run surgical_scene_recon render

Obtaining the Pulling Soft Tissues Dataset#
This application uses the EndoNeRF "pulling_soft_tissues" dataset, which contains:
- 63 RGB endoscopy frames (640Γ512 pixels)
- Corresponding depth maps
- Tool segmentation masks for instrument removal
- Camera poses and bounds (poses_bounds.npy)
Download the Dataset#
You can download the dataset from one of the following locations:
-
π¦ Direct Google Drive: https://drive.google.com/drive/folders/1zTcX80c1yrbntY9c6-EK2W2UVESVEug8?usp=sharing
-
In the Google Drive folder, you'll see:
cutting_tissues_twicepulling_soft_tissues
-
Download
pulling_soft_tissues. -
Visit the EndoNeRF repository.
Dataset Setup#
The dataset will be automatically used by the application when placed in the correct location. Refer to the HoloHub glossary for definitions of HoloHub-specific directory terms used below.
To place the dataset at <HOLOHUB_ROOT>/data/EndoNeRF/pulling/:
-
From the HoloHub root directory:
mkdir -p data/EndoNeRF -
Extract and move (or copy) the downloaded dataset:
mv /path/to/pulling_soft_tissues data/EndoNeRF/pulling
Important: The dataset MUST be physically at the path above, do NOT use symlinks. Docker containers cannot follow symlinks outside mounted volumes.
Verify the Dataset Structure#
Verify that your dataset has this structure:
<HOLOHUB_ROOT>/
βββ data/
βββ EndoNeRF/
βββ pulling/
βββ images/ # 63 RGB frames (.png)
βββ depth/ # 63 depth maps (.png)
βββ masks/ # 63 tool masks (.png)
βββ poses_bounds.npy # Camera poses (8.5 KB)
Models Used by the surgical_scene_recon Application#
The surgical_scene_recon application uses a 4D Dynamic Gaussian Splatting model that combines:
- 3D Gaussian Splatting model - A point-based neural scene representation
-
HexPlane Temporal Deformation Network - A spatiotemporal feature grid with MLPs for modeling tissue dynamics
-
Gaussian Splatting Model
The Gaussian Splatting model can be described as:
- Architecture: 3D Gaussians with learned position, scale, rotation, opacity, and color
- Initialization: Multi-frame point cloud (~30,000-50,000 points from all frames)
- Renderer:
gsplatlibrary (CUDA-accelerated differentiable rasterization) - Spherical Harmonics of degree 3 (16 coefficients per Gaussian for view-dependent color)
-
Resolution: 640Γ512 pixels (RGB, three channels)
-
Temporal Deformation Network Model
The Temporal Deformation Network model deforms 3D Gaussians and can be described as:
- Architecture: HexPlane 4D spatiotemporal grid + MLP decoder
- Input: 3D position + normalized time value [0, 1]
- Output: Deformed position, scale, rotation, and opacity changes
- Training: Two-stage process (coarse: static, fine: with deformation)
- Inference: Direct PyTorch (no conversion, full precision)
About the Model Training Process#
The application trains in two stages:
- The Coarse Stage where the application learns the base static Gaussian models without deformation.
- The Fine Stage where a temporal deformation network model is added for dynamic tissue modeling.
The training uses:
- Multi-modal Data: RGB images, depth maps, tool segmentation masks
- Loss Functions: RGB loss, depth loss, TV loss, masking losses
- Optimization: Adam optimizer with batch-size scaled learning rates
- Tool Removal: Segmentation masks applied during training for tissue-only reconstruction
The training pipeline (gsplat_train.py) runs in the following order:
- Data Loading using EndoNeRF parser loads RGB, depth, masks, and poses.
- Initialization uses Multi-frame point cloud (~30k points).
- Training happens in two stages:
- Coarse
- Fine
- Optimization is done by the Adam (Adaptive Moment Estimation) optimizer with batch-size scaled learning rates.
- Regularization, for depth loss, TV loss, and masking losses, is performed on the data.
The default training command trains a model on all 63 frames with 2000 iterations, producing smooth temporal deformation and high-quality reconstruction.
Training outputs are saved to <HOLOHUB_APP_BIN>/output/trained_model/, where <HOLOHUB_APP_BIN> is <HOLOHUB_ROOT>/build/surgical_scene_recon/applications/surgical_scene_recon/ by default.
ckpts/fine_best_psnr.pt- Best checkpoint (use for rendering)ckpts/fine_step00XXX.pt- Regular checkpointslogs/- Training logstb/- TensorBoard logsrenders/- Saved render frames
You can view training logs using TensorBoard:
tensorboard --logdir <HOLOHUB_APP_BIN>/output/trained_model/tb
Holoscan Pipeline Architecture#
The real-time rendering uses the following Holoscan operators:
EndoNeRFLoaderOp β GsplatLoaderOp β GsplatRenderOp β HolovizOp
β
ImageSaverOp
- EndoNeRFLoaderOp: Streams camera poses and timestamps
- GsplatLoaderOp: Loads checkpoint and deformation network
- GsplatRenderOp: Applies temporal deformation and renders
- HolovizOp: Real-time GPU-accelerated visualization
- ImageSaverOp: Optional frame saving
Requirements for the surgical_scene_recon Application#
- Hardware:
- NVIDIA GPU (RTX 3000+ series recommended, tested on RTX 6000 Ada Generation)
- ~2 GB free disk space (for the dataset)
- ~30 GB free disk space (for Docker containers)
- Software:
- Docker with NVIDIA GPU support
- X11 display server (for visualization)
- Holoscan SDK 3.7.0 or later (automatically provided in containers)
Application Integration Testing#
We provide integration tests.
To test the application for training and inference, run:
./holohub test surgical_scene_recon --verbose
Performance#
Tested Configuration:
- GPU: NVIDIA RTX 6000 Ada Generation
- Container: Holoscan SDK 3.7.0
- Training Time: ~5 minutes (63 frames, 2000 iterations)
- Rendering: Real-time >30 FPS
Quality Metrics (train mode):
- PSNR: ~36-38 dB
- SSIM: ~0.80
- Gaussians: ~50,000 splats
- Deformation: Smooth temporal consistency
Troubleshooting#
Problem: "FileNotFoundError: poses_bounds.npy not found"#
- Cause: Dataset not in correct location or symlink used
- Solution: Ensure dataset is physically at
<HOLOHUB_ROOT>/data/EndoNeRF/pulling/ - Verify: Run
file data/EndoNeRF- should show "directory", not "symbolic link"
Problem: "Unable to find image holohub-surgical_scene_recon"#
- Cause: Container not built yet
- Solution: Remove
--no-docker-buildflag (let CLI build automatically) - Or: Manually build:
./holohub build-container surgical_scene_recon
Problem: Holoviz window doesn't appear#
- Cause: X11 forwarding not enabled
- Solution: Run
xhost +local:dockerbefore training - Verify: Check
echo $DISPLAYshows a value
Problem: GPU out of memory#
- Cause: Another process using GPU
- Solution: Check
nvidia-smiand stop other processes - Or: Reduce batch size (advanced - edit training config)
Acknowledgements#
Citation#
If you use this work, cite the following:
- EndoNeRF:
@inproceedings{wang2022endonerf,
title={EndoNeRF: Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery},
author={Wang, Yuehao and Yifan, Wang and Tao, Rui and others},
booktitle={MICCAI},
year={2022}
}
- 3D Gaussian Splatting:
@article{kerbl20233d,
title={3d gaussian splatting for real-time radiance field rendering},
author={Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
journal={ACM Transactions on Graphics},
year={2023}
}
gsplatLibrary:
@software{ye2024gsplat,
title={gsplat},
author={Ye, Vickie and Turkulainen, Matias and others},
year={2024},
url={https://github.com/nerfstudio-project/gsplat}
}
License#
This application is licensed under Apache 2.0. See individual files for specific licensing:
- Application code: Apache 2.0 (NVIDIA)
- Training utilities: MIT License (EndoGaussian Project)
- Spherical harmonics utils: BSD-2-Clause (PlenOctree)