As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft errors are causing more and more problems in high-performance scientific and engineer-ing computation. In order to improve reliability (increase the mean time to failure) of computing systems, a lot of efforts have been devoted to developing techniques to forecast, prevent, and recover from errors at different levels, including architecture, application, and algorithm. In this paper, we focus on algorithmic error resilient iterative linear solvers and introduce a redundant subspace correction method. Using a general framework of redundant subspace corrections, we construct iterative methods, which have the following properties: (1) Maintain c...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
International audienceAs the computational power of high performance computing (HPC) systems continu...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Actes del 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '15)...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Resilience is considered a challenging under-addressed issue that the high performance computing com...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS tec...
none3noAs large-scale linear equation systems are pervasive in many scientific fields, great efforts...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
We investigate the design of dynamic programming algorithms in unreliable memories, i.e., in the pr...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
International audienceAs the computational power of high performance computing (HPC) systems continu...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Actes del 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '15)...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Resilience is considered a challenging under-addressed issue that the high performance computing com...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS tec...
none3noAs large-scale linear equation systems are pervasive in many scientific fields, great efforts...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
We investigate the design of dynamic programming algorithms in unreliable memories, i.e., in the pr...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
International audienceAs the computational power of high performance computing (HPC) systems continu...