The increasing size of supercomputers is allowing to compute solutions for various computational problems that, not so long ago, were consuming too much time and memory to be solved. However, at that scale, other challenges are arising. The optimal performance of dedicated algorithms running on these parallel and distributed architectures gets hampered by these new challenges. Large-scale systems involve their own challenges due to the total number of hardware and software components and the complexity of these components, including system reliability, availability, and scalability. In particular, hardware or software failures may occur at any moment during the execution of parallel applications that, therefore, cannot complete unless these...
This work studies the reliability of embedded systems with approximate computing on software and har...
The parallel computing platforms available today are increasingly larger. Typically the emerging par...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The increasing size of supercomputers is allowing to compute solutions for various computational pro...
As the computational power of high performance computing (HPC) systems continues to increase by usin...
As the number of processors in today’s parallel systems continues to grow, the mean-time-to-failure ...
An A complex computer system consists of billions of transistors, miles of wires, and many interacti...
The difficulty of designing fault-tolerant distributed algorithms increases with the severity of fa...
Le passage de l'échelle des nouvelles plates-formes de calcul parallèle et distribué soulève de nomb...
This work deals with high performance computing on large scale platforms like computing grids. Compu...
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation ...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
This work studies the reliability of embedded systems with approximate computing on software and har...
The parallel computing platforms available today are increasingly larger. Typically the emerging par...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...
The increasing size of supercomputers is allowing to compute solutions for various computational pro...
As the computational power of high performance computing (HPC) systems continues to increase by usin...
As the number of processors in today’s parallel systems continues to grow, the mean-time-to-failure ...
An A complex computer system consists of billions of transistors, miles of wires, and many interacti...
The difficulty of designing fault-tolerant distributed algorithms increases with the severity of fa...
Le passage de l'échelle des nouvelles plates-formes de calcul parallèle et distribué soulève de nomb...
This work deals with high performance computing on large scale platforms like computing grids. Compu...
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation ...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Dense matrix factorizations, like LU, Cholesky and QR, are widely used for scientific applications t...
This work studies the reliability of embedded systems with approximate computing on software and har...
The parallel computing platforms available today are increasingly larger. Typically the emerging par...
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific application...