A Study using Asynchronous Lock-free Buffer with SCHED_DEADLINE#
Authors: Holoscan Team (NVIDIA)
Supported platforms: x86_64, aarch64
Last modified: August 20, 2025
Latest version: 1.0.0
Minimum Holoscan SDK version: 3.5.0
Tested Holoscan SDK versions: 3.5.0
Contribution metric: Level 3 - Developmental
This tutorial demonstrates the impact of using an
asynchronous lock-free buffer
with
SCHED_DEADLINE
scheduling policy in Linux on the message latency in a Holoscan
SDK application and compares it with the default buffer.
Application Configuration#
The application source code is provided in the application directory.
The application consists of two PingTxOp
operators (tx1
and tx2
) and one PingRxOp
operator (rx
). Both tx1
and tx2
generate messages and send them to rx
.
graph TD;
tx1("tx1") -->|out -> in1| rx("rx");
tx2("tx2") -->|out -> in2| rx("rx");
tx1
: Sends messages after busy-waiting for 5ms. The busy-wait is representative of reading sensor data and a brief processing time.tx2
: Sends messages after busy-waiting for 10ms. The busy-wait is representative of reading sensor data and a brief processing time.rx
: Receives messages from bothtx1
andtx2
, and calculates message latency and inter-message interval period. Then, it waits for a brief 1 ms which could be representative of acutating a signal after a brief processing time.
The connection between the operators is configured to use either a default
buffer (DoubleBuffer
) or an async lock-free buffer.
Experiment Run Instructions#
Application-specific run instructions are provided in the application directory.
The following is an example application run instruction:
./holohub run async_buffer_deadline --as-root --docker-opts='--ulimit rtprio=99 --cap-add=CAP_SYS_NICE'
We need to run the application with root privileges and other flags as
SCHED_DEADLINE
Linux scheduling policy requires those flags.
Experiment Scripts#
To run all the experiments in this tutorial:
./tutorials/async_buffer_deadline/run_experiment.sh
The above script will run the experiment and generate the plots in
period_variation
under this directory.
Experimental Setup#
The experiment demonstrates how an async lock-free buffer can allow Holoscan SDK
operators to be run with different SCHED_DEADLINE
periods independently
without being affected by each other's periods or runtimes.
We measure two key metrics:
- Max Message Latency: The highest latency observed for messages being
generated at tx1
(or at tx2
) and then received at rx
.
- Max Message Interval: The longest time interval between two consecutive
messages from tx1
(or from tx2
) and then received at rx
.
We run two main scenarios with fixed rx
period of 10ms:
-
Fixed
tx1
Period, Varyingtx2
Period:tx1
period is fixed at 20ms.tx2
period is varied from 20ms to 100ms.- We measure the impact on
tx1
's latency and message interval. - Since the periods
tx1
andrx
are fixed, the message timings oftx1
must ideally not be impacted with varyingtx2
periods.
-
Fixed
tx2
Period, Varyingtx1
Period:tx2
period is fixed at 20ms.tx1
period is varied from 20ms to 100ms.- We measure the impact on
tx2
's latency and message interval. - Since the periods
tx2
andrx
are fixed, the message timings oftx2
must ideally not be impacted with varyingtx1
periods.
Results#
The results show that with the default buffer, the performance of one operator is heavily dependent on the other. However, with the async lock-free buffer, they are decoupled enabling true asynchronous execution of the operators.
TX1 Message Latency vs. TX2 Period#
With the default buffer, as tx2
's period increases, tx1
's maximum latency
also increases linearly. Since rx
cannot run before both the upstream
operators (tx1
and tx2
) have written messages for it, the latency of tx1
is affected by tx2
. With the async lock-free buffer, tx1
's latency
remains consistent and low, regardless of tx2
's period. Therefore, async
lock-free buffer unlocks independent connection between tx1
and rx
irrespective of the behavior of tx2
.
IN1 Message Interval vs. TX2 Period#
Similarly, the maximum message interval for tx1
(at the in1
port of rx
)
increases with tx2
's period when using the default buffer.
The async lock-free buffer keeps the interval stable.
TX2 Message Latency vs. TX1 Period#
The same trend is visible here. tx2
's message latency is affected by tx1
's period with the default buffer, but not with the async lock-free buffer.
IN2 Message Interval vs. TX1 Period#
The message interval for tx2
(at the in2
port of rx
) remains stable with the async lock-free buffer, independent of tx1
's period.
Conclusion and Guidance#
The asynchronous lock-free buffer connection between operators enables true
independent execution when using SCHED_DEADLINE
Linux scheduling policy.
This buffer type allows each operator to maintain its specified runtime and
period without interference from other operators in the pipeline.
Key Insights:
- With async lock-free buffer: Operators run independently with their
configured SCHED_DEADLINE
runtime and periods, achieving predictable real-time performance
- With default buffer (DoubleBuffer): Operators become coupled, where one
operator's performance can be impacted by another operator's
behavior, even when SCHED_DEADLINE
policy is applied
Developer Recommendations:
1. Use async lock-free buffers when implementing soft real-time applications
with SCHED_DEADLINE
scheduling to ensure predictable operator performance
2. Avoid default buffers in SCHED_DEADLINE
scenarios where operator
independence is critical for meeting real-time constraints
3. Using default buffer with SCHED_DEADLINE
policy means satisfying both
the constraints of the double buffer and periodic execution of
SCHED_DEADLINE
. Depending on the application, this may not be desirable.
4. Using SCHED_DEADLINE
for a chosen few operators (for example, source
operators in a DAG) along with default
buffer may provide a good balance because this provides predictable execution
for chosen few SCHED_DEADLINE
operators while allowing applications to run
normally otherwise.
3. Test both buffer types during development to understand the performance
implications in your specific use case, especially when using SCHED_DEADLINE
policy
4. Monitor message latency and intervals to verify that operators maintain
their intended timing characteristics