COTS Journal

Hurdles Fall for Doing Intel x86 DSP on OpenVPX

By: Ian Stalker, Curtiss-Wright Controls Embedded Computing

Enhanced vector graphics and Serial RapidIO support are opening the way for Intel processing to meet military embedded DSP needs.

With Intel’s introduction this month of its new Intel Core i7-2715QE next-generation quad-core processor, the design of x86-based embedded military digital signal processing (DSP) systems takes a great leap forward. The new processor, which is faster and more power efficient than its previous generation, also features the new 256-bit wide Intel Advanced Vector Extensions (AVX) floating-point instructions (Figure 1). DSP algorithms rely heavily on the throughput of vector instructions and so will benefit greatly. Before the introduction of Intel AVX, vectorized signal processing functions have been limited to 128 bits. 

Of equal interest to signal processing system designers is the fact that with this platform, Serial RapidIO (S-RIO), the preferred fabric for the types of processor-to-processor communications required by demanding DSP systems, is now for the first time, supported, thanks to IDT’s upcoming PCIe Gen2 to S-RIO Gen2 protocol conversion bridging semiconductor products. This brings S-RIO, the fabric of choice, well supported by the OpenVPX/VITA 65 standard, to Intel-based open architecture system designs.

Now the latest x86 processor can be used in a RapidIO-based network, supporting reliable packet transmission, any architecture, while also delivering low and predictable latencies and providing the benefits of RapidIO messaging, which are ideal for large peer to peer clusters of processors used in complex signal processing applications.

The DSP Sweet Spot

For embedded DSP designers, Intel’s most recent micro-architecture (code named micro-architecture Sandy Bridge) further establishes the x86 architecture as the leading candidate for the most demanding compute intensive multiprocessor systems. The Intel quad-core processor boasts numerous micro-architecture enhancements and design features over Intel’s previous processors. For example, the new processor is faster at the same clock speeds as earlier processors, because of its more sophisticated caching and branch prediction. This platform also delivers quad-core processing with power levels that match the stringent requirements of rugged embedded military environments: 4 cores at 45W.

But the single greatest improvement for DSP system performance delivered by Intel’s latest platform is the new AVX processing unit. In recent years Intel has demonstrated its ongoing commitment to high-performance vectorized processing by investing in continual enhancements to AVX’s predecessor, Intel Streaming SIMD Extensions (Intel  SSE), a 128-bit wide processing unit capable of simultaneously operating on four 32-bit floating-point values. Intel SSE also featured support for double-precision floating point, a feature that was not available in AltiVec. In Intel’s multicore processors each core was given its own SSE unit, so the raw floating-point performance scaled with the number of cores. In the new platform Intel has upgraded the SSE approach with AVX, doubling the size to 256 bits wide.

This doubling of vector processing performance is a significant milestone in DSP system design. DSP algorithms used in critical military applications such as radar, SIGINT and image processing, depend on the precision achieved with floating point numbers along with speed of processing (Figure 2). The new Intel Core i7 effectively doubles that performance over previous approaches. For typical size 1D and 2D FFTs, the improvement gained by AVX is in the 1.5 to 2X range (approx) over SSE. The AVX instruction set was designed to support future extensions, which hints at wider implementations in the future. 

Serial RapidIO for Data Movement

Further helping to establish Intel as the ideal platform for DSP applications is the addition of S-RIO support. For embedded and high-performance applications, S-RIO as of yet has no peer when it comes to multiprocessor system processor-to-processor communications. But prior to this generation of Core i7, there was no support for S-RIO for Intel platforms, which of course limited the viability of Intel architecture’s use in DSP multiprocessor systems. Solutions have been available to support InfiniBand, which is popular in the cluster computing world but is not embraced in military applications, and for Gigabit Ethernet. For single board computers, where the requirement is typically a single processor communicating with I/O, these fabrics have been sufficient. The lack of support for S-RIO though deprived would-be Intel-based DSP system designers of the ability to select the multiprocessor fabric of choice.

IDT’s upcoming PCI Express (PCIe) Gen2 to S-RIO gen2 bridge provides the first solution for S-RIO support on Intel-based platforms. The new S-RIO Gen2 switches provide 3X the aggregate bandwidth and are twice as fast as the earlier RapidIO 1.3-based switches per port. Signaling rates increase from 3.125 Gbit/s to Gen2’s 6.25 Gbit/s for switch ports, resulting in 20 Gbit/s per second per port in the switch fabric. IDT’s bridge will provide a mapping from PCIe Gen2 into this S-RIO Gen2-based switch on board and into the backplane.

Better Than FPGAs

IDT’s bridge will support the two main S-RIO transfer modes, memory-mapped transfers and S-RIO messaging. S-RIO bridges implemented in FPGAs don’t support high-performance messaging, a feature which directly maps to higher level software APIs such as MPI. Another plus offered by the IDT silicon is the inclusion of DMA engines that speed computation while off-loading the host processor. Intel processors typically don’t have DMA engines on-chip, but depend instead on the peripheral chip to move data. 

Without a DMA engine, moving data can require a large amount of the host processor’s attention, with the result that a multicore processor might have one of its cores (and associated power) largely consumed by moving data, which is all the more burdensome because it has to be done in code.

IDT’s bridge is physically much smaller and lower power than today’s 10 Gigabit Ethernet (10GbE) alternatives, (while being 1.6x faster), which is great for SWaP-constrained systems. For in-the-box processor to processor connections, 10 Gbit Ethernet is over-featured, making both the controller and switch chips larger and slower than the equivalent S-RIO devices. Moreover in a 10GbE network, reliable end to end packet transmissions with reliable transport could take several milliseconds.

Space-Constrained Military Systems

Another advantage of S-RIO for space-constrained military systems is the ability to support all topologies including either distributed switch, or centralized switch architectures. Distributed switch systems (an example is the VITA 65 BPK6-CEN05-11.2.5-n backplane profile) make use of the local S-RIO switch and thus avoid the need for a separate switch card. For example, if the system was using a ½ ATR Short enclosure (four 1-inch slots), this capability saves 25 percent of the space and a considerable amount of power. For large systems, centralized switch architectures are often preferred, and S-RIO is equally adept at this approach. Many leading vendors offer S-RIO switch card solutions.

An example of a high-performance DSP engine designed to take full advantage of the latest offering for Intel’sCore i7 is the new CHAMP-AV8 (Figure 3) from Curtiss-Wright Controls Embedded Computing. The CHAMP-AV8 is the first rugged, high-performance OpenVPX DSP (digital signal processing) engine based on the Intel Core i7-2715QE. It also supports IDT’s upcoming Gen2 PCIe-to-S-RIO protocol conversion bridge. The rugged CHAMP-AV8 pair of quad-core processors delivers performance, rated at up to 269 GFLOPS. With IDT’s bridge chip, the CHAMP-AV8 supports Gen2 S-RIO and Gen2 PCIe interfaces, which enables it to deliver triple the bandwidth of first generation VPX products with up to 240 Gbits/s of fabric performance, thus ensuring that application performance can scale commensurately with the much higher CPU performance. 

Curtiss-Wright Controls Embedded Computing
Ashburn, VA.
(703) 779-7800.
[www.cwcembedded.com].

© 2009 RTC Group, Inc., 905 Calle Amanecer, Suite 250, San Clemente, CA 92673