As large-scale linear equation systems are pervasive in many scientific fields, great efforts have been done over the last decade in realizing efficient techniques to solve such systems, possibly relying on High Performance Computing (HPC) infrastructures to boost the performance. In this framework, the ever-growing scale of supercomputers inevitably increases the frequency of faults, making it a crucial issue of HPC application development.A previous study [1] investigated the possibility to enhance the Inhibition Method (IMe) -a linear systems solver for dense unstructured matrices-with fault tolerance to single hard errors, i.e. failures causing one computing processor to stop.This article extends [1] by proposing an efficient technique ...
This dissertation details contributions made by the author to the field of computer science while wo...
As the number of processors in today’s parallel systems continues to grow, the mean-time-to-failure ...
International audienceThis paper compares the performance of different approaches to tolerate failur...
As large-scale linear equation systems are pervasive in many scientific fields, great efforts have b...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
This dissertation details contributions made by the author to the field of computer science while wo...
As the number of processors in today’s parallel systems continues to grow, the mean-time-to-failure ...
International audienceThis paper compares the performance of different approaches to tolerate failur...
As large-scale linear equation systems are pervasive in many scientific fields, great efforts have b...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
Dense matrix factorizations like LU, Cholesky and QR are widely used for scientific applications tha...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
This dissertation details contributions made by the author to the field of computer science while wo...
As the number of processors in today’s parallel systems continues to grow, the mean-time-to-failure ...
International audienceThis paper compares the performance of different approaches to tolerate failur...