NVIDIA NV-CLIP NIM #

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: Python
Last modified: May 13, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 1.0.3
Tested Holoscan SDK versions: 1.0.3, 2.1.0, 2.5.0
Contribution metric: Level 1 - Highly Reliable

NV-CLIP is a multimodal embeddings model for image and text, and this is a sample application that shows how to use the OpenAI SDK with NVIDIA Inference Microservice (NIM). Whether you are using a NIM from build.nvidia.com/ or a self-hosted NIM, this sample application will work for both.

Quick Start#

Get your API Key and start the sample application.

Enter your API key in nvidia_nim.yaml
./holohub run nvidia_nim_nvclip

Advanced#

Configuring the sample application#

Use the nvidia_nim.yaml configuration file to configure the sample application:

NVIDIA-Hosted NV-CLIP NIM#

By default, the application is configured to use NVIDIA-hosted NV-CLIP NIM.

nim:
 base_url: https://integrate.api.nvidia.com/v1
 api_key:

base_url: The URL of your NIM instance. Defaults to NVIDIA-hosted NIMs. api_key: Your API key to access NVIDIA-hosted NIMs.

Note: you may also configure your API key using an environment variable. E.g., export API_KEY=...

# To use NVIDIA hosted NIMs available on build.nvidia.com, export your API key first
export API_KEY=[enter your API key here]

Self-Hosted NIMs#

To use a self-hosted NIM, refer to the NV-CLIP NIM documentation to configure and start the NIM.

Then, comment out the NVIDIA-hosted section and uncomment the self-hosted configuration section in the nvidia_nim.yaml file.

nim:
  base_url: http://0.0.0.0:8000/v1/
  encoding_format: float
  api_key: NA
  model: nvidia/nvclip-vit-h-14

Build The Application#

To run the sample application, you must first build a Docker image that includes the sample application and its dependencies:

# Build and run the Docker images from the root directory of Holohub
./holohub run-container nvidia_nim_nvclip

Run the Application#

To use the NIMs on build.nvidia.com/, configure your API key in the nvidia_nim.yaml configuration file and run the sample app as follows:

./holohub run nvidia_nim_nvclip

Using the Application#

Once the application is ready, it will prompt you to input URLs to the images you want to perform inference.

Enter a URL to an image: https://domain.to/my/image-cat.jpg
Downloading image...

Enter a URL to another image or hit ENTER to continue: https://domain.to/my/image-rabbit.jpg
Downloading image...

Enter a URL to another image or hit ENTER to continue: https://domain.to/my/image-dog.jpg
Downloading image...

If there are no more images that you want to use, hit ENTER to continue and then enter a prompt:

Enter a URL to another image or hit ENTER to continue:

Enter a prompt: Which image contains a rabbit?

The application will connect to the NIM to generate an answer and then calculate the cosine similarity between the images and the prompt:

⠧ Generating...
Prompt: Which image contains a rabbit?
Output:
Image 1: 3.0%
Image 2: 52.0%
Image 3: 46.0%

NVIDIA NV-CLIP NIM#