Skip to content

FM ASR

Authors: Joshua Martinez (NVIDIA)
Supported platforms: x86_64
Last modified: March 18, 2025
Language: Python
Latest version: 1.0
Minimum Holoscan SDK version: 0.4.1
Tested Holoscan SDK versions: 0.4.1, 0.5.0
Contribution metric: Level 3 - Developmental
This project is proof-of-concept demo featuring the combination of real-time, low-level signal processing and deep learning inference. It currently supports the RTL-SDR. Specifically, this project demonstrates the demodulation, downsampling, and automatic transcription of live, civilian FM radio broadcasts. The pipeline architecture is shown in the figure below.

Pipeline Architecture

The primary pipeline segments are written in Python. Future improvements will introduce a fully C++ system.

This project leverages NVIDIA's Holoscan SDK for performant GPU pipelines, cuSignal package for GPU-accelerated signal processing, and the RIVA SDK for high accuracy automatic speech recognition (ASR).

Table of Contents

Install

To begin installation, clone this repository using the following:

git clone https://github.com/nvidia-holoscan/holohub.git
NVIDIA Riva is required to perform the automated transcriptions. You will need to install and configure the NGC-CLI tool, if you have not done so already, to obtain the Riva container and API. The Riva installation steps may be found at this link: Riva-Install. Note that Riva performs a TensorRT build during setup and requires access to the targeted GPU. This project has been tested with RIVA 2.10.0.

Container-based development and deployment is supported. The supported configurations are explained in the sections that follow.

Local Sensor - Basic Configuration

The Local Sensor configuration assumes that the RTL-SDR is connected directly to the GPU-enabled system via USB. I/Q samples are collected from the RTL-SDR directly, using the SoapySDR library. Specialized containers are provided for Jetson devices.

Only two containers are used in this configuration: - The Application Container which includes all the necessary low level libraries, radio drivers, Holoscan SDK for the core application pipeline, and the Riva client API; and - The Riva SDK container that houses the ASR transcription service.

LocalSensor

For convenience, container build scripts are provided to automatically build the application containers for Jetson and x86 systems. The Dockerfiles can be readily modified for ARM based systems with a discrete GPU. To build the container for this configuration, run the following:

# Starting from FM-ASR root directory
cd scripts
./build_application_container.sh # builds Application Container
Note that this script does not build the Riva container.

A script for running the application container is also provided. The run scripts will start the containers and leave the user at a bash terminal for development. Separate launch scripts are provided to automatically run the application.

# Starting from FM-ASR root directory
./scripts/run_application_container.sh

Local Jetson Container

Helper scripts will be provided in a future release.

Remote Sensor - Network in the Loop

This configuration is currently in work and will be provided in a future release. Developers can modify this code base to support this configuration if desired.

Bare Metal Install

Will be added in the future. Not currently supported.

Startup

After installation, the following steps are needed to launch the application: 1. Start the Riva ASR service 2. Launch the Application Container

Scripted Launch

The above steps are automated by some helper scripts.

# Starting from FM-ASR root directory
./scripts/lauch_application.sh # Starts Application Container and launches app using the config file defined in the script

Manual Launch

As an alternative to launch_application.sh, the FM-ASR pipeline can be run from inside the Application Container using the following commands:

cd /workspace
export CONFIG_FILE=/workspace/params/holoscan.yml # can be edited by user
python fm_asr_app.py $CONFIG_FILE

Initialize and Start the Riva Service

Riva can be setup following the Quickstart guide (version 2.10.0 currently supported). To summarize it, run the following:

cd <riva_quickstart_download_directory>
bash riva_init.sh
bash riva_start.sh
The initialization step will take a while to complete but only needs to be done once. Riva requires a capable GPU to setup and run properly. If your system has insufficient resources, the initialization script may hang.

When starting the service, Riva may output a few "retrying" messages. This is normal and not an indication that the service is frozen. You should see a message saying Riva server is ready... once successful.

Note for users with multiple GPUs:

If you want to specify which GPU Riva uses (defaults to device 0), open and edit <riva_quickstart_download_directory>/config.sh, then change line

gpus_to_use="device=0"
to
gpus_to_use="device=<your-device-number>"
# or, to guarantee a specific device
gpus_to_use="device=<your-GPU-UUID>"
You can determine your GPUs' UUIDs by running nvidia-smi -L.

Configuration Parameters

A table of the configuration parameters used in this project is shown below, organized by application operator.

Parameter Type Description
run_time int Number of seconds that pipeline will execute
RtlSdrGeneratorOp
sample_rate float Reception sample rate used by the radio. RTL-SDR max stable sample rate without dropping is 2.56e6.
tune_frequency float Tuning frequency for the radio in Hz.
gain float 40.0
PlayAudioOp
play_audio bool Flag used to enable simultaneous audio playback of signal.
RivaAsrOp
sample_rate int Audio sample rate expected by the Riva ASR model. Riva default is to 16000, other values will incurr an additional resample operation within Riva.
max_alternatives int Riva - Maximum number of alternative transcripts to return (up to limit configured on server). Setting to 1 returns only the best response.
word-time-offsets bool Riva - Option to output word timestamps in transcript.
automatic-punctuation bool Riva - Flag that controls if transcript should be automatically punctuated.
uri str localhost:50051
no-verbatim-transcripts bool Riva - If specified, text inverse normalization will be applied
boosted_lm_words str Riva - words to boost when decoding. Useful for handling jargon and acronyms.
boosted_lm_score float Value by which to boost words when decoding
language-code str Riva - Language code of the model to be used. US English is en-US. Check Riva docs for more options
interim_transcriptions bool Riva - Flag to include interim transcriptions in the output file.
ssl_cert str Path to SSL client certificates file. Not currently utilized
use_ssl bool Boolean to control if SSL/TLS encryption should be used. Not currently utilized.
recognize_interval int Specifies the amount of data RIVA processes per request, in time (s).
TranscriptSinkOp
output_file str File path to store a transcript. Existing files will be overwritten.

Known Issues

This table will be populated as issues are identified.

Issue Description Status