Traditional fault-tolerant multi-threading architectures provide good fault tolerance by re-executing all the computations. However, such a full re-execution significantly increases the demand on the processor resources, resulting in severe performance degradation. To address this problem, this dissertation presents Active Verification Management (AVM) approaches that utilize a checker hierarchy to increase its performance with a minimal effect on the overall reliability. Based on a simplified queueing model, AVM employs a filter checker which prioritizes the verification candidates to selectively do verification. This dissertation proposes three filter checkers - based on (1) result usage, (2) result bitwidth, and (3) result anomaly - that...
International audienceIn this paper, we combine the traditional checkpointing and rollback recovery ...
In microprocessors, achieving an efficient utilization of the execution units is a key factor in imp...
International audienceThis paper investigates the optimal number of processors to execute a parallel...
Over the past four decades microprocessors have come to be a vital and inseparable part of the moder...
Abstract—Microprocessors are becoming increasingly vulnerable to soft errors due to the current tren...
As high computing power is available at an affordable cost, we rely on microprocessor-based systems ...
International audienceErrors have become a critical problem for high performance computing. Checkpoi...
As semiconductor technology scales into the deep submicron regime the occurrence of transient or sof...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Resilience has become a critical problem for high performance computing. Checkpointing protocols are...
Building a high-performance microprocessor presents many reliability challenges. De-signers must ver...
Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive v...
Technology scaling has led to growing concerns about reliability in microprocessors. Currently, faul...
Our quest for faster and efficient computing devices has led us to processor designs with enormous c...
As device dimensions continue to be aggressively scaled, microprocessors are becoming increasingly v...
International audienceIn this paper, we combine the traditional checkpointing and rollback recovery ...
In microprocessors, achieving an efficient utilization of the execution units is a key factor in imp...
International audienceThis paper investigates the optimal number of processors to execute a parallel...
Over the past four decades microprocessors have come to be a vital and inseparable part of the moder...
Abstract—Microprocessors are becoming increasingly vulnerable to soft errors due to the current tren...
As high computing power is available at an affordable cost, we rely on microprocessor-based systems ...
International audienceErrors have become a critical problem for high performance computing. Checkpoi...
As semiconductor technology scales into the deep submicron regime the occurrence of transient or sof...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Resilience has become a critical problem for high performance computing. Checkpointing protocols are...
Building a high-performance microprocessor presents many reliability challenges. De-signers must ver...
Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive v...
Technology scaling has led to growing concerns about reliability in microprocessors. Currently, faul...
Our quest for faster and efficient computing devices has led us to processor designs with enormous c...
As device dimensions continue to be aggressively scaled, microprocessors are becoming increasingly v...
International audienceIn this paper, we combine the traditional checkpointing and rollback recovery ...
In microprocessors, achieving an efficient utilization of the execution units is a key factor in imp...
International audienceThis paper investigates the optimal number of processors to execute a parallel...