The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and tolerance techniques. Effective use of such techniques re...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
The complexity of integrated system on-chips as well as commercial processor’s architecture has incr...
In this article, several methods are outlined for detecting functional changes in an IC due to exter...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
textDependability and fault tolerance are important aspects of modern computer systems. Particle str...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Soft errors are faults which are not caused by defective hardware, rather they are induced due to no...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Technology scaling has led to growing concerns about reliability in microprocessors. Currently, faul...
This paper presents an empirical investigation on the soft error sensitivity (SES) of microprocessor...
abstract: Soft errors are considered as a key reliability challenge for sub-nano scale transistors. ...
Successive generations of processors use smaller transistors in the quest to make more powerful comp...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
The complexity of integrated system on-chips as well as commercial processor’s architecture has incr...
In this article, several methods are outlined for detecting functional changes in an IC due to exter...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
textDependability and fault tolerance are important aspects of modern computer systems. Particle str...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Soft errors are faults which are not caused by defective hardware, rather they are induced due to no...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Technology scaling has led to growing concerns about reliability in microprocessors. Currently, faul...
This paper presents an empirical investigation on the soft error sensitivity (SES) of microprocessor...
abstract: Soft errors are considered as a key reliability challenge for sub-nano scale transistors. ...
Successive generations of processors use smaller transistors in the quest to make more powerful comp...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
The complexity of integrated system on-chips as well as commercial processor’s architecture has incr...
In this article, several methods are outlined for detecting functional changes in an IC due to exter...