Skip to content

HoloChat

Authors: Nigel Nelson (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: March 18, 2025
Language: Python
Latest version: 0.2.0
Minimum Holoscan SDK version: 2.0.0
Tested Holoscan SDK versions: 2.0.0
Contribution metric: Level 4 - Experimental

Table of Contents

HoloChat is an AI-driven chatbot, built on top of a locally hosted Code-Llama model OR a remote NIM API for Llama-3-70b, which acts as developer's copilot in Holoscan development. The LLM leverages a vector database comprised of the Holoscan SDK repository and user guide, enabling HoloChat to answer general questions about Holoscan, as well act as a Holoscan SDK coding assistant.

HoloChat Demo

Hardware Requirements: ๐Ÿ‘‰๐Ÿ’ป

  • Processor: x86/Arm64

If running local LLM: - GPU: NVIDIA dGPU w/ >= 28 GB VRAM - Memory: >= 28 GB of available disk memory - Needed to download fine-tuned Code Llama 34B and BGE-Large embedding model

*Tested using NVIDIA IGX Orin w/ RTX A6000 and Dell Precision 5820 Workstation w/ RTX A6000

Running HoloChat: ๐Ÿƒ๐Ÿ’จ

When running HoloChat, you have two LLM options: - Local: Uses Phind-CodeLlama-34B-v2 running on your local machine using Llama.cpp - Remote: Uses Llama-3-70b-Instruct using the NVIDIA NIM API

TLDR; ๐Ÿฅฑ

To run locally:

./dev_container build_and_run holochat --run_args --local
To run using the NVIDIA NIM API:
echo "NVIDIA_API_KEY=<api_key_here>" > ./applications/holochat/.env

./dev_container build_and_run holochat

Build Notes: โš™๏ธ

Build Time: - HoloChat uses a PyTorch container from NGC and may also download the ~23 GB Phind LLM from HuggingFace. As such, the first time building this application will likely take ~45 minutes depending on your internet speeds. However, this is a one-time set-up and subsequent runs of HoloChat should take seconds to launch.

Build Location:

  • If running locally: HoloChat downloads ~28 GB of model data to the holochat/models directory. As such, it is recommended to only run this application on a disk drive with ample storage (ex: the 500 GB SSD included with NVIDIA IGX Orin).

Running Instructions:

If connecting to your machine via SSH, be sure to forward the ports 7860 & 8080:

ssh <user_name>@<IP address> -L 7860:localhost:7860 -L 8080:localhost:8080

Running w/ Local LLM ๐Ÿ’ป

To build and start the app:

./dev_container build_and_run holochat --run_args --local
Once the LLM is loaded on the GPU and the Gradio app is running, HoloChat should be available at http://127.0.0.1:7860/.

Running w/ NIM API โ˜๏ธ

To use the NIM API you must create a .env file at:

./applications/holochat/.env
This is where you should place your NVIDIA API key.
NVIDIA_API_KEY=<api_key_here>

To build and run the app:

./dev_container build_and_run holochat
Once the Gradio app is running, HoloChat should be available at http://127.0.0.1:7860/.

Usage Notes: ๐Ÿ—’๏ธ

Intended use: ๐ŸŽฏ

HoloChat is developed to accelerate and assist Holoscan developersโ€™ learning and development. HoloChat serves as an intuitive chat interface, enabling users to pose natural language queries related to the Holoscan SDK. Whether seeking general information about the SDK or specific coding insights, users can obtain immediate responses thanks to the underlying Large Language Model (LLM) and vector database.

HoloChat is given access to the Holoscan SDK repository, the HoloHub repository, and the Holoscan SDK user guide. This essentially allows users to engage in natural language conversations with these documents, gaining instant access to the information they need, thus sparing them the task of sifting through vast amounts of documentation themselves.

Known Limitations: โš ๏ธ๐Ÿšง

Before diving into how to make the most of HoloChat, it's crucial to understand and acknowledge its known limitations. These limitations can guide you in adopting the best practices below, which will help you navigate and mitigate these issues effectively. * Hallucinations: Occasionally, HoloChat may provide responses that are not entirely accurate. It's advisable to approach answers with a healthy degree of skepticism. * Memory Loss: LLM's limited attention window may lead to the loss of previous conversation history. To mitigate this, consider restarting the application to clear the chat history when necessary. * Limited Support for Stack Traces: HoloChat's knowledge is based on the Holoscan repository and the user guide, which lack large collections of stack trace data. Consequently, HoloChat may face challenges when assisting with stack traces.

Best Practices: โœ…๐Ÿ‘

While users should be aware of the above limitations, following the recommended tips will drastically minimize these possible shortcomings. In general, the more detailed and precise a question is, the better the results will be. Some best practices when asking questions are: * Be Verbose: If you want to create an application, specify which operators should be used if possible (HolovizOp, V4L2VideoCaptureOp, InferenceOp, etc.). * Be Specific: The less open-ended a question is the less likely the model will hallucinate. * Specify Programming Language: If asking for code, include the desired language (Python or C++). * Provide Code Snippets: If debugging errors include as much relevant information as possible. Copy and paste the code snippet that produces the error, the abbreviated stack trace, and describe any changes that may have introduced the error.

In order to demonstrate how to get the most out of HoloChat two example questions are posed below. These examples illustrate how a user can refine their questions and as a result, improve the responses they receive:


Worst๐Ÿ‘Ž: โ€œCreate an app that predicts the labels associated with a videoโ€

Better๐Ÿ‘Œ: โ€œCreate a Python app that takes video input and sends it through a model for inference.โ€

Best๐Ÿ™Œ: โ€œCreate a Python Holoscan application that receives streaming video input, and passes that video input into a pytorch classification model for inference. Then, collect the modelโ€™s predicted class and use Holoviz to display the class label on each video frame.โ€


Worst๐Ÿ‘Ž: โ€œWhat os can I use?โ€

Better๐Ÿ‘Œ: โ€œWhat operating system can I use with Holoscan?โ€

Best๐Ÿ™Œ: โ€œCan I use MacOS with the Holoscan SDK?โ€

Appendix:

Meta Terms of Use:

By using the Code-Llama model, you are agreeing to the terms and conditions of the license, acceptable use policy and Metaโ€™s privacy policy.

Implementation Details:

HoloChat operates by taking user input and comparing it to the text stored within the vector database, which is comprised of Holoscan SDK information. The most relevant text segments from SDK code and the user guide are then appended to the user's query. This approach allows the chosen LLM to answer questions about the Holoscan SDK, without being explicitly trained on SDK data.

However, there is a drawback to this method - the most relevant documentation is not always found within the vector database. Since the user's question serves as the search query, queries that are too simplistic or abbreviated may fail to extract the most relevant documents from the vector database. As a consequence, the LLM will then lack the necessary context, leading to poor and potentially inaccurate responses. This occurs because LLMs strive to provide the most probable response to a question, and without adequate context, they hallucinate to fill in these knowledge gaps.