AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing exponentially. However, as integrated circuit technology scales below 65 nm, the critical charge required to flip a gate or a memory cell has been reduced and thus causing higher soft error rate from cosmic-radiations. Soft errors affect computers by producing silently data corruption which is hard to detect and correct. Current research of soft errors resilience for dense linear solver offers limited capability when facing large scale computing systems, and suffers from both soft error and round-off error due to floating point arithmetic. This work proposes a fault tolerant algorithm that recovers the solution of a dense linear system Ax = b fro...
International audience: The advent of extreme scale machines will require the use of parallel resour...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...
As large-scale linear equation systems are pervasive in many scientific fields, great efforts have b...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft ...
Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft e...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
The threat of soft error induced system failure in high performance computing systems has become mor...
International audience: The advent of extreme scale machines will require the use of parallel resour...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...
As large-scale linear equation systems are pervasive in many scientific fields, great efforts have b...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft ...
Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft e...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
The threat of soft error induced system failure in high performance computing systems has become mor...
International audience: The advent of extreme scale machines will require the use of parallel resour...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...