Soft errors are increasing in modern computer systems. These faults can corrupt the results of numerical solvers commonly used in scientific and electromagnetic simulations. If the severity of a bitflip is high, then our numerical code might never converge. There are several techniques to address the issue of soft faults in numerical solvers. Self-stabilizing and Algorithm-Based Fault Tolerance (ABFT) techniques are notably the most popular choice when it comes to designing a fault tolerant scheme. Selfstabilizing numerical methods have been developed to retrieve numerical stability in the presence of faults at the cost of running computation-intensive reliable iterations. Our work presents efficient techniques to determine when to ...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract—As hardware devices like processor cores and memory sub-systems based on nano-scale technol...
This paper revisits the interconnection of self-stabilization and fault-tolerance. Self-stabilizing ...
Self-stabilizing algorithms recover from all cases of transient failure, but the mechanism of self-s...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
A key issue confronting petascale and exascale computing is the growth in probability of soft and ha...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
Our purpose in the present paper is to present a brief overview of the relatively new paradigm of se...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
Abstract Fault tolerance measures can be used to distinguish between different self-stabilizing solu...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract—As hardware devices like processor cores and memory sub-systems based on nano-scale technol...
This paper revisits the interconnection of self-stabilization and fault-tolerance. Self-stabilizing ...
Self-stabilizing algorithms recover from all cases of transient failure, but the mechanism of self-s...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
A key issue confronting petascale and exascale computing is the growth in probability of soft and ha...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
Our purpose in the present paper is to present a brief overview of the relatively new paradigm of se...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
Abstract Fault tolerance measures can be used to distinguish between different self-stabilizing solu...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...