Actes del 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '15)As the transistor’s feature size decreases following Moore’s Law, hardware will become more prone to permanent, intermittent, and transient errors, increasing the number of failures experienced by applications, and diminishing the confidence of users. As a result, resilience is considered the most difficult under addressed issue faced by the High Performance Computing community. In this paper, we address the design of error resilient iterative solvers for sparse linear systems. Contrary to most previous ap- proaches, based on Krylov subspace methods, for this purpose we analyze stationary component-wise relaxation. Concre...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
Resilience is considered a challenging under-addressed issue that the high performance computing com...
As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft ...
International audience: The advent of extreme scale machines will require the use of parallel resour...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
Large scale simulations are used in a variety of application areas in science and engineering to hel...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
International audienceAs the computational power of high performance computing (HPC) systems continu...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
3rd International Workshop on Energy Efficient Supercomputing (E2SC '15)We formulate an implementati...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
We propose an adaptive scheme to reduce communication overhead caused by data movement by selectivel...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
Resilience is considered a challenging under-addressed issue that the high performance computing com...
As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft ...
International audience: The advent of extreme scale machines will require the use of parallel resour...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
Large scale simulations are used in a variety of application areas in science and engineering to hel...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
International audienceAs the computational power of high performance computing (HPC) systems continu...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
3rd International Workshop on Energy Efficient Supercomputing (E2SC '15)We formulate an implementati...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
We propose an adaptive scheme to reduce communication overhead caused by data movement by selectivel...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
Abstract. Resilience is a major challenge for large-scale systems. It is particularly important for ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...