How FPGAs Address the Radiation Challenges of Space and Aviation Systems


Concerns about the radiation hardness of microelectronic components were once solely the domain of design teams working on spacecraft, satellites, and rockets. But today, the designers of many high-reliability, high-availability terrestrial applications must consider ground-level radiation effects when selecting components for their systems. 

Advanced driver assistance systems in the field of automotive electronics, along with applications in medical electronics, safety-critical industrial electronics, and high availability commercial applications such as data center servers and storage systems, all have emerging requirements to operate reliably in ground-level background radiation. This has encouraged some suppliers of microelectronic components to include some level of radiation hardening in their product designs. 

Designers of aviation and space systems have turned to field programmable gate arrays (FPGAs), known for their reprogrammability and radiation-tolerance, to create on-board systems to meet the demanding performance needs of future missions. But to understand the suitability of these microelectronic devices in space we must first understand the types of radiation effects they’re up against and the role traceability plays in ensuring these systems maintain their integrity. 

Types of Radiation Effects
Radiation effects can be divided into two categories – single-event effects and total ionizing dose effects. 

Single-event effects occur when a single sub-atomic particle interacts with a semiconductor. In ground-level and airborne electronics, single-event effects are most commonly caused either by alpha particles arising from impurities in the semiconductor material or the packaging materials surrounding the semiconductor or by neutrons in the atmosphere which are created by the bombardment of atmospheric gases by radiation from space, high in the atmosphere. 

In space, single-events are caused primarily by protons and heavy ions. Unimpeded by the atmosphere, and concentrated by the earth’s magnetic field, protons and heavy ions encountered by earth-orbiting satellites are orders of magnitude more energetic and damaging than the neutrons and alpha particles encountered at ground level.

Single-Event Upsets
Single-event upsets (SEUs) occur when a flip-flop or an SRAM cell experiences a sudden small pulse of current due to the ionization and subsequent recombination of a region of semiconductor in the vicinity of a PN junction in the flip-flop or memory cell. This happens when a charged particle (alpha particle, proton, or heavy ion) passes through the semiconductor. Despite not having any electrical charge, neutrons cause this effect by colliding with a silicon atom in the semiconductor lattice, which causes a shower of charged particles to emerge from the collision and cause the ionization and recombination effect. 

The SEU alters the content of the flip-flop or SRAM cell. For example, a memory element that contained a logic 1 would contain a logic 0 after experiencing an SEU. The effect will remain until new data is written into the flip-flop or SRAM cell at the next clock cycle. At a system level, the result of an SEU could be relatively inconsequential. 

For example, an SEU that occurs in a communications datapath may have no effect at all, because the data being processed is already protected by forward error encoding. On the other hand, an SEU that occurs in a state machine that is controlling the flight control surface actuators in an aircraft may have disastrous consequences, if the state machine is not built with fault-tolerant design techniques. Each designer is responsible for understanding the safety-critical nature of their system and employing such fault-tolerant design techniques as may be necessary.

Figure 1: FET Cross Section during Radiation Single-Event

Single-Event Functional Interrupts 

If an SEU occurs in a control register within an integrated circuit, it may cause a temporary malfunction of the integrated circuit. This special case is known as a single-event functional interrupt (SEFI). 

An example would be an embedded microcontroller in a complex integrated circuit such as an FPGA. The function of the microcontroller includes power-on sequencing of functional blocks, crypto and security functions, and reprogramming. An SEU that occurs in a register of the microcontroller could cause the microcontroller to malfunction until the SEU is detected and the microcontroller is reset. Evidently, the consequences of a SEFI are potentially many times more severe than an SEU. 

To prevent the occurrence of SEFIs such as this in terrestrial and airborne systems, the Platform Management Controller (PMC) found within the FPGA-based AMD Versal product family features a triple-redundant hardwired MicroBlaze CPU with an SEU-optimized voting circuit. Radiation testing has confirmed that the probability of a SEFI in the PMC is effectively zero in ground-level and airborne applications, and roughly one in six years in geosynchronous earth orbit (GEO), where highly energetic heavy ions are prevalent.

Another example of a SEFI is where the configuration memory of an SRAM-based FPGA experiences an SEU. In 90% to 95% of cases, a configuration memory upset will not affect the functionality of the circuit, as most configuration bits are unused in any specific design. In today’s state-of-the-art FPGAs, the configuration memory is protected by error detection and correction encoding. Each configuration word, comprising data and correction bits, is interleaved with other configuration words, physically separating bits in each word and effectively reducing to zero the probability that a single radiation event will cause uncorrectable errors in a configuration word. Software running on the triple-redundant PMC automatically corrects errors in the configuration memory.

 

 

Single-Event Latch Up
The third category is single-event latch-up (SEL). This is a potentially very dangerous phenomenon where a particle with sufficient energy causes a parasitic PNPN structure to become forward-biased, creating a low-impedance current path between power and ground within the microcircuit. 

Figure 2: Configuration Memory Interleaving

In extreme cases, the microcircuit can be immediately destroyed. In other cases, the microcircuit can remain functional after the latch-up is cleared by cycling power, but the damage caused by the excessive current flow may dramatically reduce the lifetime of the component.

Total Ionizing Dose Effects 

Total ionizing dose (TID) effects arise due to the long-term accumulation of radiation over the lifetime of the component. They are usually observed as a gradual increase in leakage current and a gradual deterioration in performance over the lifetime of the part. 

Radiation Testing

Some component manufacturers produce radiation test data for their products, and this is sometimes supplemented by test data produced independently by space agencies, national laboratories, and other research institutions. However, microelectronic components are becoming more complex, featuring multiple processors with multiple architectures, along with diverse memory structures, dedicated DSP resources, arrays of vector processors, gigabit transceivers and other functions. This creates enormous challenges for teams planning radiation test campaigns, as they must create efficient and effective test methodologies to exercise all aspects of the target microcircuit’s operation, while limiting their use of scarce and expensive radiation test facility beam time.

To address the challenges of radiation testing, AMD, for example, has developed a unique, innovative test methodology for the Versal adaptive SoC devices which exercises the many disparate functions of the SoC core. A comprehensive self-checking validation tool causes large amounts of data traffic to move between processing elements, reaching high levels of functional coverage and achieving very efficient use of limited radiation beam time.

Traceability is Key

How does the equipment manufacturer know that radiation test data collected months or years before com

Figure 3: Configuration Scrubbing Software Running on Triple Redundant Platform Management Controller

ponent acquisition apply to the parts that are currently being purchased? 

This is an important question, as radiation effects depend on multiple factors and not just the original design of the integrated circuit. Any drift in the manufacturing process can cause a variation in the TID effects. A change in the foundry can result in changes in design rules which can have a dramatic effect on the onset threshold at which SEL effects appear, possibly rendering a microcircuit unusable in a harsh environment such as GEO or mid-earth orbit, where high concentrations of heavy ions pose a significant threat to microelectronic systems.

Clearly, it is important that developers of space and aviation systems use components that have traceability at least to individual semiconductor fabrication lots. Some suppliers of microcircuits for space and aviation maintain traceability not only to the wafer lot but the specific location of the die in an individual wafer in a wafer lot. Additionally, many suppliers of microcircuits to space and aviation customers will maintain comprehensive change control systems, so that if any manufacturing change needs to be made, customers will be alerted in advance. 

Figure 4: SEFIs in SoC Processor System Showing Distribution and Test Coverage

This level of traceability and change control is great assurance for developers whose system designs must function flawlessly in harsh radiation environments. As such, it is important to use parts from suppliers that offer that level of traceability.

Conclusion

Modern FPGAs have become go-to solutions for space and aviation systems and must cope with radiation effects at ground level. In some cases, this results in microcircuits that effectively tolerate radiation effects in airborne and space applications. Testing is critical before committing a space or airborne design to a specific integrated circuit, and the level of complexity of today’s leading components creates a significant challenge for organizations performing testing. Traceability and change control are essential to ensuring that test data gathered before the acquisition of flight components is relevant to the parts being sourced. 

Leave a Reply

Your email address will not be published. Required fields are marked *