HoloChat¶
Authors: Nigel Nelson (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: March 18, 2025
Language: Python
Latest version: 0.2.0
Minimum Holoscan SDK version: 2.0.0
Tested Holoscan SDK versions: 2.0.0
Contribution metric: Level 4 - Experimental
Table of Contents¶
HoloChat is an AI-driven chatbot, built on top of a locally hosted Code-Llama model OR a remote NIM API for Llama-3-70b, which acts as developer's copilot in Holoscan development. The LLM leverages a vector database comprised of the Holoscan SDK repository and user guide, enabling HoloChat to answer general questions about Holoscan, as well act as a Holoscan SDK coding assistant.
Hardware Requirements: ๐๐ป¶
- Processor: x86/Arm64
If running local LLM: - GPU: NVIDIA dGPU w/ >= 28 GB VRAM - Memory: >= 28 GB of available disk memory - Needed to download fine-tuned Code Llama 34B and BGE-Large embedding model
*Tested using NVIDIA IGX Orin w/ RTX A6000 and Dell Precision 5820 Workstation w/ RTX A6000
Running HoloChat: ๐๐จ¶
When running HoloChat, you have two LLM options: - Local: Uses Phind-CodeLlama-34B-v2 running on your local machine using Llama.cpp - Remote: Uses Llama-3-70b-Instruct using the NVIDIA NIM API
TLDR; ๐ฅฑ¶
To run locally:
./dev_container build_and_run holochat --run_args --local
echo "NVIDIA_API_KEY=<api_key_here>" > ./applications/holochat/.env
./dev_container build_and_run holochat
Build Notes: โ๏ธ¶
Build Time: - HoloChat uses a PyTorch container from NGC and may also download the ~23 GB Phind LLM from HuggingFace. As such, the first time building this application will likely take ~45 minutes depending on your internet speeds. However, this is a one-time set-up and subsequent runs of HoloChat should take seconds to launch.
Build Location:
- If running locally: HoloChat downloads ~28 GB of model data to the
holochat/models
directory. As such, it is recommended to only run this application on a disk drive with ample storage (ex: the 500 GB SSD included with NVIDIA IGX Orin).
Running Instructions:¶
If connecting to your machine via SSH, be sure to forward the ports 7860 & 8080:
ssh <user_name>@<IP address> -L 7860:localhost:7860 -L 8080:localhost:8080
Running w/ Local LLM ๐ป¶
To build and start the app:
./dev_container build_and_run holochat --run_args --local
Running w/ NIM API โ๏ธ¶
To use the NIM API you must create a .env file at:
./applications/holochat/.env
NVIDIA_API_KEY=<api_key_here>
To build and run the app:
./dev_container build_and_run holochat
Usage Notes: ๐๏ธ¶
Intended use: ๐ฏ¶
HoloChat is developed to accelerate and assist Holoscan developersโ learning and development. HoloChat serves as an intuitive chat interface, enabling users to pose natural language queries related to the Holoscan SDK. Whether seeking general information about the SDK or specific coding insights, users can obtain immediate responses thanks to the underlying Large Language Model (LLM) and vector database.
HoloChat is given access to the Holoscan SDK repository, the HoloHub repository, and the Holoscan SDK user guide. This essentially allows users to engage in natural language conversations with these documents, gaining instant access to the information they need, thus sparing them the task of sifting through vast amounts of documentation themselves.
Known Limitations: โ ๏ธ๐ง¶
Before diving into how to make the most of HoloChat, it's crucial to understand and acknowledge its known limitations. These limitations can guide you in adopting the best practices below, which will help you navigate and mitigate these issues effectively. * Hallucinations: Occasionally, HoloChat may provide responses that are not entirely accurate. It's advisable to approach answers with a healthy degree of skepticism. * Memory Loss: LLM's limited attention window may lead to the loss of previous conversation history. To mitigate this, consider restarting the application to clear the chat history when necessary. * Limited Support for Stack Traces: HoloChat's knowledge is based on the Holoscan repository and the user guide, which lack large collections of stack trace data. Consequently, HoloChat may face challenges when assisting with stack traces.
Best Practices: โ ๐¶
While users should be aware of the above limitations, following the recommended tips will drastically minimize these possible shortcomings. In general, the more detailed and precise a question is, the better the results will be. Some best practices when asking questions are: * Be Verbose: If you want to create an application, specify which operators should be used if possible (HolovizOp, V4L2VideoCaptureOp, InferenceOp, etc.). * Be Specific: The less open-ended a question is the less likely the model will hallucinate. * Specify Programming Language: If asking for code, include the desired language (Python or C++). * Provide Code Snippets: If debugging errors include as much relevant information as possible. Copy and paste the code snippet that produces the error, the abbreviated stack trace, and describe any changes that may have introduced the error.
In order to demonstrate how to get the most out of HoloChat two example questions are posed below. These examples illustrate how a user can refine their questions and as a result, improve the responses they receive:
Worst๐: โCreate an app that predicts the labels associated with a videoโ
Better๐: โCreate a Python app that takes video input and sends it through a model for inference.โ
Best๐: โCreate a Python Holoscan application that receives streaming video input, and passes that video input into a pytorch classification model for inference. Then, collect the modelโs predicted class and use Holoviz to display the class label on each video frame.โ
Worst๐: โWhat os can I use?โ
Better๐: โWhat operating system can I use with Holoscan?โ
Best๐: โCan I use MacOS with the Holoscan SDK?โ
Appendix:¶
Meta Terms of Use:¶
By using the Code-Llama model, you are agreeing to the terms and conditions of the license, acceptable use policy and Metaโs privacy policy.
Implementation Details:¶
HoloChat operates by taking user input and comparing it to the text stored within the vector database, which is comprised of Holoscan SDK information. The most relevant text segments from SDK code and the user guide are then appended to the user's query. This approach allows the chosen LLM to answer questions about the Holoscan SDK, without being explicitly trained on SDK data.
However, there is a drawback to this method - the most relevant documentation is not always found within the vector database. Since the user's question serves as the search query, queries that are too simplistic or abbreviated may fail to extract the most relevant documents from the vector database. As a consequence, the LLM will then lack the necessary context, leading to poor and potentially inaccurate responses. This occurs because LLMs strive to provide the most probable response to a question, and without adequate context, they hallucinate to fill in these knowledge gaps.