Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As archi-tectural trends point towards multi-threaded multi-core de-signs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper proposes a software-based multi-core alternative for tran-sient fault tolerance using process-level redundancy (PLR). PLR creates a set of redundant processes per application process and systematically compares the processes to guar-antee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR’s software-centric approach to transient fault toleranc...
As processor manufacturers keep pushing the limits of the transistor, the reliability of computer sy...
Future multicore processors will become more susceptible to a variety of hardware failures. In parti...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
This paper speculates that technology trends pose new challenges for fault tolerance in microprocess...
A new approach is proposed that exploits repetition inherent in programs to provide low-overhead tra...
Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, ...
International audienceDevelopment trends for computing platforms moved from increasing the frequency...
This paper describes a single-version algorithmic approach to design in fault tolerant computing in ...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
Transient faults are emerging as a critical reliability concern for modern microproces-sors. Recentl...
We propose a scheme for transient-fault recovery called Simultaneously and Redundantly Threaded proc...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
As processor manufacturers keep pushing the limits of the transistor, the reliability of computer sy...
Future multicore processors will become more susceptible to a variety of hardware failures. In parti...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
This paper speculates that technology trends pose new challenges for fault tolerance in microprocess...
A new approach is proposed that exploits repetition inherent in programs to provide low-overhead tra...
Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, ...
International audienceDevelopment trends for computing platforms moved from increasing the frequency...
This paper describes a single-version algorithmic approach to design in fault tolerant computing in ...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
Transient faults are emerging as a critical reliability concern for modern microproces-sors. Recentl...
We propose a scheme for transient-fault recovery called Simultaneously and Redundantly Threaded proc...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
As processor manufacturers keep pushing the limits of the transistor, the reliability of computer sy...
Future multicore processors will become more susceptible to a variety of hardware failures. In parti...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...