Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft error rates were significant primarily in space and high-atmospheric computing. Modern architectures now use features so small at sufficiently low voltages that soft errors are becoming important even at terrestrial altitudes. Due to their large number of components, supercomputers are particularly susceptible to soft errors. Since many large scale parallel scientific applications use iterative linear algebra methods, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. Many users consider these methods invulnerable to most soft errors since they converge from an imprecise soluti...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
International audienceThe conjugate gradient (CG) method is the most widely used iterative scheme fo...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Emerging high-performance computing platforms, with large component counts and lower power margins, ...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
This paper presents an empirical investigation on the soft error sensitivity (SES) of microprocessor...
Soft errors due to cosmic rays cause reliability problems during lifetime operation of digital syste...
International audienceMany methods are available to detect silent errors in high-performance computi...
This paper proposes the use of metrics to refine system design for soft errors protection in system ...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fund...
International audienceThe conjugate gradient (CG) method is the most widely used iterative scheme fo...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Emerging high-performance computing platforms, with large component counts and lower power margins, ...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
This paper presents an empirical investigation on the soft error sensitivity (SES) of microprocessor...
Soft errors due to cosmic rays cause reliability problems during lifetime operation of digital syste...
International audienceMany methods are available to detect silent errors in high-performance computi...
This paper proposes the use of metrics to refine system design for soft errors protection in system ...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
International audienceThe advent of extreme scale machines will require the use of parallel resource...