On future extreme scale computers, it is expected that faults will become an increasingly serious problem as the number of individual components grows and failures become more frequent. This is driving the interest in designing algorithms with built-in fault tolerance that can continue to operate and that can replace data even if part of the computation is lost in a failure. For fault-free computations, the use of adaptive refinement techniques in combination with finite element methods is well established. Furthermore, iterative solution techniques that incorporate information about the grid structure, such as the parallel geometric multigrid method, have been shown to be an efficient approach to solving various types of partial different ...
Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount...
This work deals with high performance computing on large scale platforms like computing grids. Compu...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
We examine novel fault tolerance schemes for data loss in multigrid solvers which essentially combin...
AbstractA key issue confronting petascale and exascale computing is the growth in probability of sof...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
International audienceAs the computational power of high performance computing (HPC) systems continu...
266 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2002.A parallel multigrid algorith...
Several recovery techniques for parallel iterative methods are presented. First, the implementation ...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
We present a new approach to the use of parallel computers with adaptive finite element methods. Thi...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
We investigate parallel adaptive grid refinement and focus in particular on hierarchically adaptive,...
The ability to perform effective adaptive analysis has become a critical issue in the area of physic...
Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount...
This work deals with high performance computing on large scale platforms like computing grids. Compu...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...
We examine novel fault tolerance schemes for data loss in multigrid solvers which essentially combin...
AbstractA key issue confronting petascale and exascale computing is the growth in probability of sof...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
International audienceAs the computational power of high performance computing (HPC) systems continu...
266 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2002.A parallel multigrid algorith...
Several recovery techniques for parallel iterative methods are presented. First, the implementation ...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
We present a new approach to the use of parallel computers with adaptive finite element methods. Thi...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
This paper continues to develop a fault tolerant extension of the sparse grid combination technique ...
We investigate parallel adaptive grid refinement and focus in particular on hierarchically adaptive,...
The ability to perform effective adaptive analysis has become a critical issue in the area of physic...
Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount...
This work deals with high performance computing on large scale platforms like computing grids. Compu...
International audienceIn this talk we will discuss possible numerical remedies to survive data loss...