Skip to content

📷🤖 Florence-2

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: March 18, 2025
Language: Python
Latest version: 1.0.0
Minimum Holoscan SDK version: 2.1.0
Tested Holoscan SDK versions: 2.1.0
Contribution metric: Level 1 - Highly Reliable

This application demonstrates how to run the Florence-2 models on a live video feed with the possibility of changing the task and optional prompt via a QT UI.

Holoscan VILA Live

Note: This demo currently uses Florence-2-large-ft, but any of the Florence-2 models should work as long as the correct URLs and names are used in Dockerfile and config.yaml: - Florence-2-large-ft - Florence-2-large - Florence-2-base-ft - Florence-2-base

⚙️ Setup Instructions

The app defaults to using the video device at /dev/video0

Note: You can use a USB webcam as the video source, or an MP4 video by following the instructions for the V4L2_Camera example app.

To debug if this is the correct device download v4l2-ctl:

sudo apt-get install v4l-utils
To check for your devices run:
v4l2-ctl --list-devices
This command will output something similar to this:
NVIDIA Tegra Video Input Device (platform:tegra-camrtc-ca):
        /dev/media0

vi-output, lt6911uxc 2-0056 (platform:tegra-capture-vi:0):
        /dev/video0

Dummy video device (0x0000) (platform:v4l2loopback-000):
        /dev/video3
Determine your desired video device and edit the source device in config.yaml

🚀 Build and Run Instructions

From the Holohub main directory run the following command:

./dev_container build_and_run florence-2-vision
Note: The first build will take ~1.5 hours if you're on ARM64. This is largely due to building Flash Attention 2 since pre-built wheels are not distributed for ARM64 platforms.

💻 Supported Hardware

  • IGX w/ dGPU
  • x86 w/ dGPU
  • IGX w/ iGPU and Jetson AGX supported with workaround
    There is a known issue running this application on IGX w/ iGPU and on Jetson AGX (see #500). The workaround is to update the device to avoid picking up the libnvv4l2.so library.
cd /usr/lib/aarch64-linux-gnu/
ls -l libv4l2.so.0.0.999999
sudo rm libv4l2.so.0.0.999999
sudo ln -s libv4l2.so.0.0.0.0  libv4l2.so.0.0.999999

Dev Container

To start the the Dev Container, run the following command from the root directory of Holohub:

./dev_container vscode florence-2-vision

This command will build and configure a Dev Container using a Dockerfile that is ready to run the application.

VS Code Launch Profiles

There are two launch profiles configured for this application:

  1. (debugpy) florence-2-vision/python: Launch florence-2-vision using a launch profile that enables debugging of Python code.
  2. (pythoncpp) florence-2-vision/python: Launch florence-2-vision using a launch profile that enables debugging of Python and C++ code.