Chat with NVIDIA NIM #

Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Language: Python
Last modified: May 13, 2025
Latest version: 1.0
Minimum Holoscan SDK version: 1.0.3
Tested Holoscan SDK versions: 1.0.3, 2.1.0
Contribution metric: Level 1 - Highly Reliable

This is a sample application that shows how to use the OpenAI SDK with NVIDIA Inference Microservice (NIM). Whether you are using a NIM from build.nvidia.com/ or a self-hosted NIM, this sample application will work for both.

Quick Start#

Add API key in nvidia_nim.yaml
./holohub run nvidia_nim_chat

Configuring the sample application#

Use the nvidia_nim.yaml configuration file to configure the sample application:

Connection Information#

nim:
  base_url: https://integrate.api.nvidia.com/v1
  api_key:

base_url: The URL of your NIM instance. Defaults to NVIDIA hosted NIMs. api_key: Your API key to access NVIDIA hosted NIMs.

Model Information#

The models section in the YAML file is configured with multiple NVIDIA hosted models by default. This allows you to switch between different models easily within the application by sending the prompt /m to the application.

Model parameters may be added or adjusted in the models section as well per model.

Run the sample application#

There are a couple of options to run the sample application:

Run using Docker#

To run the sample application with Docker, you must first build and run a Docker image that includes the sample application and its dependencies:

# Build and run the Docker images from the root directory of Holohub
./holohub run-container nvidia_nim

Continue to the Start the Application section once inside the Docker container.

Run the Application without Docker#

Install all dependencies from the requirements.txt file:

# optionally create a virtual environment and activate it
python3 -m venv .venv
source .venv/bin/activate

# install the required packages
pip install -r applications/nvidia_nim/chat/requirements.txt

Start the Application#

To use the NIMs on build.nvidia.com/, configure your API key in the nvidia_nim.yaml configuration file and run the sample app as follows:

note: you may also configure your api key using an environment variable. E.g., export API_KEY=...

# To use NVIDIA hosted NIMs available on build.nvidia.com, export your API key first
export API_KEY=[enter your api key here]

./holohub run nvidia_nim_chat

Have fun!

Connecting with Locally Hosted NIMs#

To use a locally hosted NIM, first download and start the NIM. Then configure the base_url parameter in the nvidia_nim.yaml configuration file to point to your local NIM instance.

The following example shows a NIM running locally and serving its APIs and the meta-llama3-8b-instruct model from http://0.0.0.0:8000/v1.

nim:
  base_url: http://0.0.0.0:8000/v1/

models:
  llama3-8b-instruct:
    model: meta-llama3-8b-instruct # name of the model serving by the NIM
    # add/update/remove the following key/value pairs to configure the parameters for the model
    top_p: 1
    n: 1
    max_tokens: 1024
    frequency_penalty: 1.0

Chat with NVIDIA NIM#