This paper presents an algorithm based fault tolerance method to harden three two-sided matrix factorizations against soft errors: reduction to Hessenberg form, tridiagonal form, and bidiagonal form. These two sided factorizations are usually the prerequisites to computing eigenvalues/eigenvectors and singular value decomposition. Algorithm based fault tolerance has been shown to work on three main one-sided matrix factorizations: LU, Cholesky, and QR, but extending it to cover two sided factorizations is non-trivial because there are no obvious offline, problem specific maintenance of checksums. We thus develop an online, algorithm specific checksum scheme and show how to systematically adapt the two sided factorization algorithms used in ...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
International audienceWe present new algorithms to detect and correct errors in the lower-upper fact...
Soft errors are increasing in modern computer systems. These faults can corrupt the results of nume...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale comp...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications...
Current algorithm-based fault tolerance (ABFT) approach for one-sided matrix decomposition on hetero...
In this paper, we extend the theory of algorithmic fault-tolerant matrix-matrix mul-tiplication, C =...
International audienceThis paper compares several fault-tolerance methods for the detection and corr...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
In high-performance systems, the probability of failure is higher for larger systems. The probabili...
Abstract—Modeling and analysis of large scale scientific systems often use linear least squares regr...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
International audienceWe present new algorithms to detect and correct errors in the lower-upper fact...
Soft errors are increasing in modern computer systems. These faults can corrupt the results of nume...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale comp...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications...
Current algorithm-based fault tolerance (ABFT) approach for one-sided matrix decomposition on hetero...
In this paper, we extend the theory of algorithmic fault-tolerant matrix-matrix mul-tiplication, C =...
International audienceThis paper compares several fault-tolerance methods for the detection and corr...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
In high-performance systems, the probability of failure is higher for larger systems. The probabili...
Abstract—Modeling and analysis of large scale scientific systems often use linear least squares regr...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
International audienceWe present new algorithms to detect and correct errors in the lower-upper fact...
Soft errors are increasing in modern computer systems. These faults can corrupt the results of nume...