FM ASR¶
Authors: Joshua Martinez (NVIDIA)
Supported platforms: x86_64
Last modified: March 18, 2025
Language: Python
Latest version: 1.0
Minimum Holoscan SDK version: 0.4.1
Tested Holoscan SDK versions: 0.4.1, 0.5.0
Contribution metric: Level 3 - Developmental
This project is proof-of-concept demo featuring the combination of real-time, low-level signal processing and deep learning inference. It currently supports the RTL-SDR. Specifically, this project demonstrates the demodulation, downsampling, and automatic transcription of live, civilian FM radio broadcasts. The pipeline architecture is shown in the figure below.
The primary pipeline segments are written in Python. Future improvements will introduce a fully C++ system.
This project leverages NVIDIA's Holoscan SDK for performant GPU pipelines, cuSignal package for GPU-accelerated signal processing, and the RIVA SDK for high accuracy automatic speech recognition (ASR).
Table of Contents¶
Install¶
To begin installation, clone this repository using the following:
git clone https://github.com/nvidia-holoscan/holohub.git
Container-based development and deployment is supported. The supported configurations are explained in the sections that follow.
Local Sensor - Basic Configuration¶
The Local Sensor configuration assumes that the RTL-SDR is connected directly to the GPU-enabled system via USB. I/Q samples are collected from the RTL-SDR directly, using the SoapySDR library. Specialized containers are provided for Jetson devices.
Only two containers are used in this configuration: - The Application Container which includes all the necessary low level libraries, radio drivers, Holoscan SDK for the core application pipeline, and the Riva client API; and - The Riva SDK container that houses the ASR transcription service.
For convenience, container build scripts are provided to automatically build the application containers for Jetson and x86 systems. The Dockerfiles can be readily modified for ARM based systems with a discrete GPU. To build the container for this configuration, run the following:
# Starting from FM-ASR root directory
cd scripts
./build_application_container.sh # builds Application Container
A script for running the application container is also provided. The run scripts will start the containers and leave the user at a bash terminal for development. Separate launch scripts are provided to automatically run the application.
# Starting from FM-ASR root directory
./scripts/run_application_container.sh
Local Jetson Container¶
Helper scripts will be provided in a future release.
Remote Sensor - Network in the Loop¶
This configuration is currently in work and will be provided in a future release. Developers can modify this code base to support this configuration if desired.
Bare Metal Install¶
Will be added in the future. Not currently supported.
Startup¶
After installation, the following steps are needed to launch the application: 1. Start the Riva ASR service 2. Launch the Application Container
Scripted Launch¶
The above steps are automated by some helper scripts.
# Starting from FM-ASR root directory
./scripts/lauch_application.sh # Starts Application Container and launches app using the config file defined in the script
Manual Launch¶
As an alternative to launch_application.sh
, the FM-ASR pipeline can be run from inside the Application Container using the following commands:
cd /workspace
export CONFIG_FILE=/workspace/params/holoscan.yml # can be edited by user
python fm_asr_app.py $CONFIG_FILE
Initialize and Start the Riva Service¶
Riva can be setup following the Quickstart guide (version 2.10.0 currently supported). To summarize it, run the following:
cd <riva_quickstart_download_directory>
bash riva_init.sh
bash riva_start.sh
When starting the service, Riva may output a few "retrying" messages. This is normal and not an indication that the service is frozen. You should see a message saying Riva server is ready...
once successful.
Note for users with multiple GPUs:
If you want to specify which GPU Riva uses (defaults to device 0), open and edit <riva_quickstart_download_directory>/config.sh
, then change line
gpus_to_use="device=0"
gpus_to_use="device=<your-device-number>"
# or, to guarantee a specific device
gpus_to_use="device=<your-GPU-UUID>"
nvidia-smi -L
.
Configuration Parameters¶
A table of the configuration parameters used in this project is shown below, organized by application operator.
Parameter | Type | Description |
---|---|---|
run_time | int | Number of seconds that pipeline will execute |
RtlSdrGeneratorOp | ||
sample_rate | float | Reception sample rate used by the radio. RTL-SDR max stable sample rate without dropping is 2.56e6. |
tune_frequency | float | Tuning frequency for the radio in Hz. |
gain | float | 40.0 |
PlayAudioOp | ||
play_audio | bool | Flag used to enable simultaneous audio playback of signal. |
RivaAsrOp | ||
sample_rate | int | Audio sample rate expected by the Riva ASR model. Riva default is to 16000, other values will incurr an additional resample operation within Riva. |
max_alternatives | int | Riva - Maximum number of alternative transcripts to return (up to limit configured on server). Setting to 1 returns only the best response. |
word-time-offsets | bool | Riva - Option to output word timestamps in transcript. |
automatic-punctuation | bool | Riva - Flag that controls if transcript should be automatically punctuated. |
uri | str | localhost:50051 |
no-verbatim-transcripts | bool | Riva - If specified, text inverse normalization will be applied |
boosted_lm_words | str | Riva - words to boost when decoding. Useful for handling jargon and acronyms. |
boosted_lm_score | float | Value by which to boost words when decoding |
language-code | str | Riva - Language code of the model to be used. US English is en-US. Check Riva docs for more options |
interim_transcriptions | bool | Riva - Flag to include interim transcriptions in the output file. |
ssl_cert | str | Path to SSL client certificates file. Not currently utilized |
use_ssl | bool | Boolean to control if SSL/TLS encryption should be used. Not currently utilized. |
recognize_interval | int | Specifies the amount of data RIVA processes per request, in time (s). |
TranscriptSinkOp | ||
output_file | str | File path to store a transcript. Existing files will be overwritten. |
Known Issues¶
This table will be populated as issues are identified.
Issue | Description | Status |
---|---|---|