When designing a real-time system, it is of the utmost importance to ask: How fast can the software react to an external event delivered by the hardware components? How much this response time varies? Which is the worst response time?
Despite the dependence on which computations need to be done by the real-time software to solve the problem at hand, there is a lower bound on the response time set by communication latency. External events are usually delivered asynchronously, by the means of interrupt requests (IRQs). Therefore, it is important to measure how long it takes for the real-time operating system to answer IRQs.
We measure IRQ latency in a relatively modern computer running the RTAI real-time OS. Despite having a recent motherboard, this computer still possesses a parallel port (LPT) interface, which enables us to compare IRQ delivery performance between PCI Express and LPT interfaces.
Our PCI Express interface is implemented by an Altera DE4 board, programmed with an project designed employing the standard tool flow (Altera Qsys). This brings us another interesting question – does the PCIe to Avalon bus abstraction provided by Altera’s tool flow impose significant overhead on the system? Although we do not compare our design to any custom solution communicating directly through the PCIe protocol, we are able to assess if IRQs are answered in a reasonable time when compared to the traditional LPT interface.
The complete source code needed for conducting the experiments here described is available at the rtai-irq-latency repository.
Similar work
Measuring IRQ latency for a LPT interface in RTAI is an old and well-known experiment:
-
Captain.at had a very popular code example for measuring LPT IRQ latency using the
rt_get_time_ns
routine. -
Aldo Núñez demonstrates how to reproduce the same kind of measurement employing an oscilloscope.
Here we employ both measurement techniques, take a statistically significant amount of samples, and reproduce the experiments both for LPT and PCIe interfaces.
Materials
Hardware
- ASUSTeK P8P67-M PRO motherboard, Intel Core i3-2120 CPU, 8 GB DDR3 1333MHz RAM
- Altera DE4-230 FPGA development board
Instruments
- Tektronix TDS 460A oscilloscope
Software
- Ubuntu 15.04
- Linux kernel 3.4.55
- RTAI 3.9.265.gd99c55e
¹
start_rt_timer
inside a RTAI module. Therefore we
resorted to using the same RTAI version as adopted by the Linux CNC project,
which we knew worked with this hardware. This is already the second machine on
which we observe this problem. We still need to report and help hunting
this bug.
Methods
IRQ echoer design
The LPT IRQ echoer consists simply on a parallel port connector with pin 2 (Data Out 0) connected to pin 10 (ACK). An IRQ is produced on the signal’s positive edge.
The PCIe IRQ echoer is implemented by the Bluespec SystemVerilog module
presented below. This module exposes two 32-bit addresses which are connected
to bar0
in Qsys. When any value is written to the first address, an IRQ is
produced, and the value is held in an internal register. When the second
address is read, the value of this internal register is returned, and the IRQ
is cleared.
The IRQ flag is also exposed via an irqflagtap
signal, which is configured as
a conduit in Qsys, allowing it to be routed to a GPIO pin and measured with an
oscilloscope. The figure below shows the connections of our Qsys system. See
Altera’s tutorial
to learn how to create the project from scratch in Quartus. The main difference
of our system from the tutorial’s one is that we configured both bar0
and
bar2
as 32 bit Non-Prefetchable.
Measurement by oscilloscope
In LPT, the signal is measured directly from the ACK pin.
In PCIe, it is measured from the GPIO pin which exposes irqflagtap
.
Measurement by software
A periodic real-time thread running at each \(100\mu\mathrm{s}\) is implemented
by the thread_main
function both in PCIe and LPT modules. It registers
the current time instant using rt_get_time_ns
and then sends a signal to
the hardware asking for an IRQ to be sent. After the IRQ is dispatched by the
hardware, it is handled by the irq_handler
real-time interrupt service routine.
As shown by the figures below, the first thing that this routine does (after
the conventional C function prologue) is measuring the current time instant.
These two measured time instants are subtracted from each other in order to
compute the latency.
System load
During the measurements, I/O load was kept high by running
in a terminal, and the following in another one:
Results
Measurement by oscilloscope
The oscilloscope was kept in the infinite persistence display mode, and data was
collected during 1 minute
Measurement by software
Data was collected during 14 minutes and plotted to a histogram. The vertical dashed lines represent the minimum and maximum latency values observed for each interface. Please note that the vertical axis has a logarithmic scale, therefore it is not possible to distinguish between a single occurrence and no occurrences at all.
Discussion
Consistency between measurement techniques
Timing diagram
The timing diagram above explains why latency measurements done with the oscilloscope (\(t_\mathrm{HW}\)) should be approximately equal to the ones done by the real-time software (\(t_\mathrm{SW}\)). The software starts to count time just before writing to the bus, but the signal is only observed by the external hardware after a certain time \(t_\mathrm{rw}\) (time to read/write to the bus). However, after the software receives the IRQ and stops counting time, approximately the same \(t_\mathrm{rw}\) delay exists for the signal to stop being observed by the external hardware.
LPT
However, looking at the LPT interface results, an additional delay of about
\(2\mu\mathrm{s}\) seems to exist in measurements done with the oscilloscope.
This is explained by the additional inb
instruction which is used in the
irq_handler
routine to verify if the IRQ has really been dispatched by the
parallel port (and not by another device which might be sharing the same IRQ).
This effectively increases to \(2 \times t_\mathrm{rw}\) the time for the
oscilloscope to stop observing a digital high signal after the IRQ is delivered.
As in our particular system IRQ 5 (used by LPT) was not shared with other
devices (as confirmed by looking at /proc/interrupts
), we did another
experiment after removing this check, by changing the IRQ handler to:
The figure below superposes measurements done using the oscilloscope with and
without the inb
instruction included in the IRQ handler. In the
experiment using the routine without inb
, the oscilloscope measurements are
very consistent with the software-measured histograms. We conclude that
\(t_\mathrm{rw} \approx 2\mu\mathrm{s}\) for the LPT interface.
PCIe
Contrary to the LPT interface, we designed our PCIe interface in such a way
that we can verify if the interrupt was really caused by our PCIe board and
clear the IRQ flag in a single operation. Whereas in LPT we first need to call
inb
to check if the IRQ is ours then call outb
to clear the relevant
data output bit, in our PCIe hardware a single read to the 0x04
register
completes both tasks.
Latency and jitter results
If we consider minimum latency, mean latency and jitter, the PCIe interface results were clearly superior to LPT. Whereas for LPT we measured \(5.7\mu\mathrm{s}\) minimum latency, \(8.1\mu\mathrm{s}\) mean latency and \(0.8\mu\mathrm{s}\) standard deviation, PCIe presented, respectively, \(4.8\mu\mathrm{s}\), \(6.4\mu\mathrm{s}\) and \(0.5\mu\mathrm{s}\). However, the software-measured data caught a particularly strange behavior which was not visible in oscilloscope measurements.
A fraction of about \(10^{-5}\) of the PCIe latency measurements behaved as outliers, forming a secondary distribution of values larger than the observed LPT latencies. Thus the PCIe interface was not better than LPT in the worst-case timing, a behavior which was not expected by us. We were still not able to trace back the reason, since the IRQ to which the PCIe board was attributed in our system was not shared with other devices, thus there is not any immediately obvious source of hardware resource contention.
Comments powered by Talkyard.