Page 1 of 1
The concept of GPGPU computing is beginning to gain traction. This disruptive technology is the emerging idea of using the latest crop of high-performance graphics processors to handle general-purpose processing tasks. GPUs have potential in application areas including target tracking, image stabilization and SAR (synthetic aperture radar) simulation. Sensor processing and software defined radio are also well suited for this kind of processing. Board-level products have emerged specifically for GPGPU computing in a number of form factors including OpenVPX.
On the forefront of this wave is NVIDIA, the graphics technology firm that originally coined the term “graphics processor.” Graphics processing units, or GPUs, are programmable floating-point graphics-rendering engines primarily used in personal computers, workstations and gaming consoles. But thanks to architectural advancements in recent years, the scope of applications to which GPUs can be applied has grown dramatically. For traditional signal processing algorithms like the FFT (Fast Fourier Transform), they provide unprecedented performance, particularly performance per watt. For UAV (Figure 1) applications such as ISR, the increases in the compute capability that are offered by the use of GPGPUs have a direct relationship to more capable detection systems, increased UAV autonomy and increased survivability. Decreases to the size, weight and power (SWaP) of the compute platform result in greater range, greater payload and greater loiter time.
An Air Force Staff Sergeant prepares the RQ-4 Global Hawk for launch using the vehicle test controller while reviewing technical orders at Beale Air Force Base, Calif.
Feeding this notion of GPUs as general-purpose processing engines, NVIDIA developed a parallel computing architecture called CUDA (an acronym for Compute Unified Device Architecture) that addresses a key weakness of FPGA parallel processing systems: the complexity of programming them. CUDA is the computing engine in NVIDIA graphics processing units (GPUs) that is accessible to software developers through industry standard programming languages. The CUDA architecture enables programmers to write programs in conventional computing languages to access the massively parallel processing capabilities of the GPU. Programmers use “C for CUDA,” which is C language with NVIDIA extensions, to write code to run on the GPUs.
Aside from serving applications in radar, signals intelligence, and video surveillance and interpretation, GPUs based on the CUDA architecture have potential in other application areas including target tracking, image stabilization, SAR (synthetic aperture radar) simulation, pattern recognition, video encoding/decoding, graphics rendering, object recognition, in-crowd behavioral monitoring and analysis, cryptography, sensor processing and software defined radio. GE Intelligent Platform, a major prime contractor in the military/aerospace industry, has evaluated the CUDA architecture in a radar system and found that performance improvement of 15x is achievable with minimal reprogramming effort.
Where GPGPU Technology Shines
One advantage of GPUs is their highly parallel nature. Some GPUs have as much as several hundred thread processors. In the military applications, that parallelism is helpful in signal processing applications that can be expressed as vector and matrix operations or linear algebra. Likewise image processing applications are also well suited because the native architecture of the GPU is geared toward handling textures, surfaces and shaders.
Because one CPU can host several GPUs, in many situations data streams arriving from sensors may be required to be staged through host processor memory before being accessed by the GPU. This can mean there is added latency in the data stream in some applications. There are several techniques to mitigate this—use of page locked host memory, direct access of host memory from the GPU and direct PCIe transfers from input device to GPU memory are some. Devices that support such functionality include FPGAs, InfiniBand interfaces, 10 Gbit Ethernet, video capture and video encoder devices. Even using these, some applications with very tight latency constraints, such as control loop systems, may not be a good fit for GPGPU.
The need for double precision floating-point operation may dictate which GPUs are suitable: some do not support double precision at all, some have reduced capability and some have fully fledged support. Another deciding factor may be the need for Error Correcting Code (ECC) on the GPU memory, caches and register files: not all devices have ECC. Access to global memory on GPUs has a large latency penalty. That’s nothing new—most systems rely on data locality for performance. There are many applications that can either tolerate this or mitigate it by use of such techniques as concurrent transfers and processing, and pipelining of many more execution threads than there are physical processors (thread occupancy). This can be done with minimal overhead to switch thread execution. The common characteristics are large data sets, a high computational intensity—where each data point undergoes multiple calculations—and a tolerance to some degree of latency.
GPGPUs on OpenVPX
Among the first to bring GPGPU computing to the embedded realm was Mercury Computer Systems. Its latest offering is an OpenVPX, dual GPU-based conduction-cooled subsystem. This subsystem is currently deployed in an embedded rugged defense surveillance platform, performing processing, exploitation and dissemination (PED). The system is powered by the Ensemble 6000 Series GSC6200. The subsystem currently delivers performance in the Teraflops range, and the incorporation of GPUs enables the solution to be delivered in an optimized size, weight and power (SWaP) footprint. Packaging technology on the GSC6200 leverages the easy-to-upgrade MxM GPU form factor, which enables users to rapidly upgrade and deploy the latest and fastest GPUs from ATI or NVIDIA.
GE Intelligent Platforms latest GPGPU offering is its IPN250, the second product from GE to feature NVIDIA’s CUDA technology. The IPN250 (Figure 2) combines NVIDIA’s GT240 96-core GPU with an Intel Core2 Duo processor operating at 2.26 GHz and 8 Gbytes of DDR3 SDRAM to deliver up to 390 Gflops of performance per card slot, depending on the application. It is designed from the ground up to be compliant with the OpenVPX standard. It is also VITA48/REDI compliant, allowing it to be deployed in the harshest environments: build options for air-, spray- and conduction-cooling are available.
The IPN250 combines NVIDIA’s GT240 96-core GPU with an Intel Core2 Duo processor operating at 2.26 GHz and 8 Gbytes of DDR3 SDRAM to deliver up to 390 Gflops of performance per card slot.
The IPN250 feature set includes two primary data plane, 10 Gigabit Ethernet ports supporting multi-board switched fabric OpenVPX architectures. A 16-lane PCI Express gen2 interface on the P2 expansion plane provides high-speed interconnect for multi-board GPGPU clusters as well as system I/O to PCI Express-enabled sensor modules such as GE’s family of Xilinx Virtex5 and Virtex6 mezzanine cards. Two 1000Base-T and two 1000Base-Bx control plane ports are available, together with additional PCI Express, USB 2.0, SATA, COM ports, GPIO, audio and TV input. Video and multimedia is supported via the dual link DVI, HDMI and VGA ports directly into the NVIDIA GT240 device to cater to a wide range of interfaces.
Modular GPGPU Approach
Providing a modular approach to GPGPU computing, Wolf Industrial Systems recently rolled out a GPGPU solution in its MXC form factor (Figure 3). The MXC technology is a hybrid derivation of the XMC, MXM 3.0 and PMC specifications. MXC embedded modules are small (70 mm x 85 mm), conduction-cooled mezzanine boards designed to be used with carrier cards. These include, but are not limited to, 3U and 6U VPX, cPCI, VME64 and custom or COM Express baseboard configurations. To improve upon the successes of XMC, PMC, MXM 3.0, and the standard busing scheme of PCIe, the MXC specification utilizes the SAMTEC Searay connector to provide superior support during extreme shock and vibration situations, while enabling high-speed electrical signals through dedicated connectivity. Five hundred individual pins are defined for each video input and output, eliminating the need for signal multiplexing.
MXC cards enable users to create VPX, VME, Compact PCIe, COM Express and custom baseboard variations. They support both expanded video input and output, and provide parallel GPGPU processing.
The new WOLF MXC cards are smaller than XMC and PMC units, yet present over 500 pins, offering more than 100 combinations of video input and outputs including. This generous pin configuration allows WOLF to create VPX, VME, Compact PCIe, COM Express and custom baseboard variations, enabling a family of products offering both expanded video input and output, parallel GPGPU processing, plus the capacity to upgrade to future technologies through drop-in replacement MXC modules. Current MXC cards use the AMD Radeon e4690, providing superior graphics capabilities coupled with video input and capture capabilities using Xilinx Spartan 6 and Virtex FPGAs.
2U Server Solution
When sheer compute density combined with off-the-shelf server compatibility are priorities, the rackmount server style form factor ranks as a leading choice. Bringing GPGPU technology onto that form factor, One Stop Systems offers a system that integrates an AMD-based motherboard featuring dual “Istanbul” processors and eight GPU boards providing 10 Teraflops compute power. The server can also accommodate a combination of GPU boards and SSD (solid-state drive) boards. Four GPU boards and four 640 Gbytes SSD boards provide 5 Tflops GPU processing and 2.5 Terabyte memory in addition to the compute power of the dual six-core processors. In addition, OSS has packed even more storage capacity into the system with four hot-swappable hard disk drives and an internal RAID controller. The server is powered by dual, redundant 1,500W power supplies and housed in a 2U-high chassis designed and manufactured to meet rigorous environmental demands.
Recently One Stop Systems expanded that offering with a rugged 2U expansion enclosure (Figure 4) that provides four or eight PCIe x16 Gen2 slots, two PCIe x16 cable interfaces, ample cooling, and an 850W power supply to support up to four GPU boards or other high-speed I/O cards, a two PCIe x16 Gen 2 host cable adapter, and two PCIe x16 one-meter cable. The system includes a system monitor (fans, temperature, voltage).Operating temperature range is 0° to 35°C while storage temperature is -40° to 85°C. The unit operates in 10 to 90 percent relative humidity (non-condensing) and operates at 0 to 10,000 feet.
This rugged 2U expansion enclosure provides four or eight PCIe x16 Gen2 slots, two PCIe x16 cable interfaces, ample cooling, and an 850W power supply to support up to four GPU boards or other high-speed I/O cards.
GE Intelligent Platforms
Mercury Computer Systems
One Stop Systems
Wolf Industrial Systems