As processor manufacturers keep pushing the limits of the transistor, the reliability of computer systems has become an increasing concern. Various fault tolerance techniques have been developed in an effort to provide reliable computing in the presence of faults. These approaches suffer from either a high resource cost or high performance overhead. This thesis presents a design for a Fault Tolerance Core (FTC) that uses configurable application-aware hardware modules for improving reliability. Application-aware fault tolerance is achieved by detecting perturbations in application execution through the monitoring of processor pipeline signals. This approach leverages hardware resources more efficiently than replication. The FTC achieves low...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
This paper speculates that technology trends pose new challenges for fault tolerance in microprocess...
Faced with the exponential growth in computing requirements, programmable hardware accelerators, suc...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
One of the major driving forces of the semiconductor industry is the continuous scaling of the silic...
Microprocessor-based systems are employed in an increasing number of applications where dependabilit...
textIn the recent past, there has been an increasing demand for low-cost safety critical application...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
This report describes an experiment in the design of a general purpose fault tolerant system, FTM. T...
There is broad consensus among academic and industrial researchers in computer architecture that har...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Clusters of message-passing computing nodes provide high-performance platforms for distributed appli...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Fault tolerance is a key requirement in several application domains of embedded processors cores. In...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
This paper speculates that technology trends pose new challenges for fault tolerance in microprocess...
Faced with the exponential growth in computing requirements, programmable hardware accelerators, suc...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
One of the major driving forces of the semiconductor industry is the continuous scaling of the silic...
Microprocessor-based systems are employed in an increasing number of applications where dependabilit...
textIn the recent past, there has been an increasing demand for low-cost safety critical application...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
This report describes an experiment in the design of a general purpose fault tolerant system, FTM. T...
There is broad consensus among academic and industrial researchers in computer architecture that har...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Clusters of message-passing computing nodes provide high-performance platforms for distributed appli...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Fault tolerance is a key requirement in several application domains of embedded processors cores. In...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
This paper speculates that technology trends pose new challenges for fault tolerance in microprocess...
Faced with the exponential growth in computing requirements, programmable hardware accelerators, suc...