Field programmable gate arrays are no longer merely convenient interconnect layers between chips in a system. In software defined radios, FPGAs are being used increasingly as a general-purpose computational fabric to implement hardware acceleration units that boost performance while lowering cost and power requirements.
Typical implementations of software defined radio (SDR) modems include a general-purpose processor (GPP), a digital signal processor (DSP) and an FPGA. However, the FPGA fabric can be used to offload the GPP or DSP with application-specific hardware acceleration units. Soft-core microprocessors can have their core extended with custom logic, or separate hardware acceleration coprocessors can be added to the system. Furthermore, with general-purpose routing resources available in the FPGA, these hardware acceleration units can run in parallel to further enhance the total computational output of the system. It is instructional to compare three distinct types of hardware acceleration units and their performance in comparison to software implementations.
Software Defined Radio
With the proliferation of wireless standards, future wireless devices will need to support multiple air interfaces and modulation formats. SDR technology enables such functionality in wireless devices by using a reconfigurable hardware platform across multiple standards.
SDR is the underlying technology behind the Joint Tactical Radio System (JTRS) initiative to develop software programmable radios that can enable seamless, real-time communication across the United States military services, and with coalition forces and allies. The functionality and expandability of JTRS is built upon an open architecture framework called the software communications architecture (SCA). JTRS terminals must support dynamic loading of any one of more than 25 specified air interfaces or waveforms that are typically more complex than those used in the civilian sector. To achieve all these requirements in a reasonable form-factor requires extensive processing power of different kinds. For that reason most architectures utilize a GPP, a DSP and an FPGA.
SDR System Architecture
The GPP, DSP and FPGA are general-purpose processing resources that can be used for different parts of the overall SDR system. Figure 1 shows an architecture example with the typical functions found in SDR divided across each of these devices. However, there is a significant amount of overlap between each of these elements. For example, an algorithm running on the DSP could be implemented in the GPP, albeit more slowly, or rewritten in HDL and run in an FPGA as a coprocessor or hardware acceleration unit.

Hardware Acceleration
Using FPGA resources for hardware acceleration can be done in several ways. However, there are three basic architectures: custom instructions, custom peripherals as coprocessors and dynamically reconfigurable application-specific processors. These hardware acceleration methods have different features and unique benefits. Understanding how and where to use each of these helps the system architect better use the FPGA resources for offloading the DSP and GPP in an SDR application.
Soft-Core Processors and Custom Instructions
With the advent of large FPGAs, small, powerful processors that could be embedded in an FPGA appeared. These “soft-core” processors are configurable bits of intellectual property (IP) that can be downloaded into an FPGA and used like any other embedded microprocessor. They even come with industry standard toolchains including compilers, instruction-set simulators, a full suite of software debug tools and an integrated development environment. This toolset is familiar to any embedded software engineer, so much so that it does not matter that the processor is downloaded to the FPGA as a bitstream. However, these soft-core processors are infinitely flexible. Before downloading the processor, a designer can choose different configuration options, trading off size for speed. A designer can also add a myriad of peripherals for memory control, communications, I/O and so forth.
Custom instructions, which take the flexibility of soft-core processors one step further, are algorithm-specific additions of hardware to the soft-core microprocessor’s arithmetic logic unit (ALU). These new hardware instructions are used in place of a time-critical piece of an algorithm, recasting the software algorithm into a hardware block. A RISC microprocessor with a custom instruction blurs the division between RISC and CISC because the custom instruction units can be multi-cycle hardware blocks doing quite complex algorithms embedded in a RISC processor with “standard instructions” that take a single clock cycle. Furthermore several custom instructions can be added to an ALU, limited only by the FPGA resources and the number of open positions in the soft-core processor’s op-code table. Figure 2 depicts the use of a custom instruction to extend the ALU of Altera’s Nios II soft-core microprocessor.

When should custom instructions be used? The most efficient use occurs when the algorithm to be accelerated is a relatively atomic operation that is called often and operates on data stored in local registers. Floating-point instructions are good examples. Floating-point arithmetic instructions can be implemented as library subroutines that the compiler automatically invokes on processors without dedicated floating-point instruction hardware. These floating-point algorithms take many clock cycles to execute. In an application they are typically used throughout the software code rather than localized to a few function calls. However, these algorithms can also be implemented as custom instructions extending a soft-core microprocessor’s ALU. Table 1 provides a comparison between several software library routines and the same function using a custom instruction. Note that even in this case the results may vary dramatically, depending on the design considerations for the custom instruction such as the amount of pipelining that is chosen in the hardware implementation.


Kontron
Advantech