Skip to content

Tracks2Endo4D#

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: Python
Last modified: March 13, 2026
Latest version: 1.0.0
Minimum Holoscan SDK version: 3.8.0
Tested Holoscan SDK versions: 3.8.0
Contribution metric: Level 2 - Trusted

A GPU-accelerated application for real-time 3D point tracking and camera parameter estimation from video, built on NVIDIA Holoscan.

TracksTo4D visualization

Overview#

Tracks2Endo4D combines state-of-the-art point tracking with 3D reconstruction to:

  • Track arbitrary points across video frames using persistent 2D point tracking
  • Reconstruct 3D structure from 2D tracks in a single feed-forward pass
  • Estimate camera parameters including intrinsics (focal length, principal point) and extrinsics (camera pose/trajectory)

Architecture#

Core Technologies#

Component Description Link
TapNext "Tracking Any Point" reformulated as next-token prediction for robust long-range point tracking GitHub
TracksTo4D NVIDIA Research's encoder-based method that infers 3D structure and camera motion from 2D tracks without 3D supervision Project Page
Holoscan SDK NVIDIA's platform for building high-performance streaming AI applications Documentation

Requirements#

Hardware#

  • NVIDIA GPU with CUDA 12+ and Vulkan support
  • Display configured for X11 (for visualization)

Software#

  • Docker with NVIDIA Container Toolkit
  • Holoscan SDK >= v3.0: The Holohub container handles this dependency automatically.

Models#

This application uses the following AI models:

Model Description Source
TapNext Init Initialization model for point tracking Converted from PyTorch to ONNX during Docker build
TapNext Forward Forward pass model for point tracking Converted from PyTorch to ONNX during Docker build
TracksTo4D 3D reconstruction from 2D tracks Downloaded with sample data from NGC

The TapNext models are not hosted as pre-built ONNX files. Instead, the Dockerfile clones the TapNet repository, downloads the official PyTorch checkpoint, and converts the models to ONNX format on the fly during the Docker image build. All ONNX models are then converted to TensorRT engines (BF16 precision) at CMake build time.

Sample Data#

Sample video data and the TracksTo4D model are automatically downloaded from NGC during the build process.

Quick Start Guide#

The entire application runs inside a Docker container. The first run builds the container image (which includes the PyTorch-to-ONNX conversion for TapNext models), downloads sample data, converts ONNX models to TensorRT, and launches the application:

./holohub run tracks2endo4d

This command will:

  1. Build the Docker container image (includes TapNext ONNX conversion from PyTorch)
  2. Launch the container
  3. Download sample data and the TracksTo4D model from NGC
  4. Copy the TapNext ONNX models (generated during the Docker build) into the data directory
  5. Convert all ONNX models to TensorRT engines (BF16 precision)
  6. Build and run the application

The build produces the following TensorRT engine files:

ONNX Model TensorRT Engine
tapnext_init.onnx tapnext_init.bf16.engine
tapnext_forward.onnx tapnext_forward.bf16.engine
tracksto4d.onnx tracksto4d.bf16.engine

Important: TensorRT engines are GPU-architecture specific. You must rebuild when switching to a different GPU.

Subsequent Runs#

Once the Docker image is built and TensorRT engines have been generated, subsequent runs reuse them. To skip the TensorRT conversion on subsequent runs:

./holohub run tracks2endo4d --configure-args "-DCONVERT_ENGINE=OFF"

Advanced Usage#

Using Holohub Container#

First, launch the Holohub container:

./holohub run-container tracks2endo4d

Building the Application#

Once your environment is set up, you can build the workflow using the following command:

./holohub build tracks2endo4d

To force TensorRT engine re-conversion (e.g., after switching GPUs):

./holohub build tracks2endo4d --configure-args "-DCONVERT_ENGINE=ON"

Running the Application#

From Outside the Container#

Run the application using the Holohub container (builds if needed):

./holohub run tracks2endo4d

To skip the build step:

./holohub run tracks2endo4d --no-build

From Inside the Container#

You can also run the application directly:

cd <HOLOHUB_SOURCE_DIR>/applications/tracks2endo4d
python3 tracks2endo4d_app.py --data <DATA_DIR> --model <MODEL_DIR>

TIP: You can get the exact "Run command" along with "Run environment" and "Run workdir" by executing:

./holohub run tracks2endo4d --dryrun --local

CMake Build Options#

This application supports the following CMake options that can be passed via --configure-args:

Option Description Default
CONVERT_ENGINE Convert ONNX models to TensorRT engines during build ON

Example usage:

./holohub build tracks2endo4d --configure-args "-DCONVERT_ENGINE=OFF"

Command Line Arguments#

The application accepts the following command line arguments:

Argument Description Default
--source Source of video input: replayer or aja replayer
-d, --data Path to data directory containing videos Uses the HOLOHUB_DATA_PATH environment variable
-m, --model Path to model directory containing TensorRT engines Uses the HOLOHUB_DATA_PATH environment variable
--viz-2d Enable 2D visualization overlay False

Configuration#

The application is configured via config.yaml. Key parameters include:

Section Parameter Description
replayer basename Video file basename (without extension)
replayer frame_rate Playback frame rate
window window_size Temporal window for tracking
window overlap_size Overlap between consecutive windows
window grid_size Grid size for point sampling
preprocessor_3d calibration_matrix Camera intrinsic matrix (if known)
tapnext model_file_path_* Paths to TensorRT engines

Using Your Own Videos#

To use custom videos, you must first convert them to GXF entity format. The conversion script is included in the Holoscan Docker container.

See the official instructions in the Holoscan SDK repo: 📄 convert_video_to_gxf_entities.py

Once converted, update the replayer/basename parameter in config.yaml to point to your new video file (without extension).

Using AJA Card as I/O#

To use an AJA capture card for real-time input:

./holohub run tracks2endo4d --run-args "--source aja"

Note: The AJA video buffer dtype is set to rgba8888 by default. If your camera is not providing an alpha channel, you can change it to rgb888 by modifying in_dtype in the aja_format_converter section of the config.yaml file.

References#