Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This dissertation develops fault tolerance algorithms for one-sided dense matrix factorizations, which handles Both hard and soft errors. For hard errors, we propose methods based on diskless checkpointing and Algorithm Based Fault Tolerance (ABFT) to provide full matrix protection, including the left and right factor that are normally seen in dense matrix factorizations. A horizontal parallel diskless ...
Abstract—Modeling and analysis of large scale scientific systems often use linear least squares regr...
Abstract—The general purpose graphics processing units (GPGPU) are increasingly deployed for scienti...
As the desire of scientists to perform ever larger computations drives the size of today’s high perf...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale comp...
This paper presents an algorithm based fault tolerance method to harden three two-sided matrix facto...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
none3noAs large-scale linear equation systems are pervasive in many scientific fields, great efforts...
On the road to exascale computing, the gap between hardware peak performance and application perform...
Emerging high-performance computing platforms, with large component counts and lower power margins, ...
Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft e...
Abstract—Modeling and analysis of large scale scientific systems often use linear least squares regr...
Abstract—The general purpose graphics processing units (GPGPU) are increasingly deployed for scienti...
As the desire of scientists to perform ever larger computations drives the size of today’s high perf...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale comp...
This paper presents an algorithm based fault tolerance method to harden three two-sided matrix facto...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
none3noAs large-scale linear equation systems are pervasive in many scientific fields, great efforts...
On the road to exascale computing, the gap between hardware peak performance and application perform...
Emerging high-performance computing platforms, with large component counts and lower power margins, ...
Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft e...
Abstract—Modeling and analysis of large scale scientific systems often use linear least squares regr...
Abstract—The general purpose graphics processing units (GPGPU) are increasingly deployed for scienti...
As the desire of scientists to perform ever larger computations drives the size of today’s high perf...