The advances in IC process make future chip multiprocessors (CMPs) more and more vulnerable to transient faults. To detect transient faults, previous core-level schemes provide redundancy for each core separately. As a result, they may leave transient faults in the uncore parts, which consume over 50 % area of a modern CMP, escaped from detection. This paper proposes RepTFD, the first core-level transient fault detection scheme with 100 % cov-erage. Instead of providing redundancy for each core separately, RepTFD provides redundancy for a group of cores as a whole. To be specific, it replays the execution of the checked group of cores on a redundant group of cores. Through comparing the execution results between the two groups of cores, all...
The challenge of improving the performance of current processors is achieved by increasing the integ...
This paper describes the design of a power efficient microarchitecture for transient fault detection...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
We propose a scheme for transient-fault recovery called Simultaneously and Redundantly Threaded proc...
CMOS scaling increases susceptibility of microprocessors to transient faults. Most current proposals...
To meet an insatiable consumer demand for greater performance at less power, silicon technology has ...
A new approach is proposed that exploits repetition inherent in programs to provide low-overhead tra...
Abstract—there are many methodsofreducing the effects of transient and permanent faults that have be...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
As microprocessors continue to evolve and grow in function-ality, the use of smaller nanometer techn...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Recent embedded real-time software tends to be multithreaded and constrained by stringent timing req...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
This paper describes the design of a power efficient microarchitecture for transient fault detection...
Time redundant execution of tasks and comparison of results is a well-known technique for detecting ...
The challenge of improving the performance of current processors is achieved by increasing the integ...
This paper describes the design of a power efficient microarchitecture for transient fault detection...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
We propose a scheme for transient-fault recovery called Simultaneously and Redundantly Threaded proc...
CMOS scaling increases susceptibility of microprocessors to transient faults. Most current proposals...
To meet an insatiable consumer demand for greater performance at less power, silicon technology has ...
A new approach is proposed that exploits repetition inherent in programs to provide low-overhead tra...
Abstract—there are many methodsofreducing the effects of transient and permanent faults that have be...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
As microprocessors continue to evolve and grow in function-ality, the use of smaller nanometer techn...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Recent embedded real-time software tends to be multithreaded and constrained by stringent timing req...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
This paper describes the design of a power efficient microarchitecture for transient fault detection...
Time redundant execution of tasks and comparison of results is a well-known technique for detecting ...
The challenge of improving the performance of current processors is achieved by increasing the integ...
This paper describes the design of a power efficient microarchitecture for transient fault detection...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...