ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direction of Frank Mueller.) Future extreme-scale computer systems may expose incorrect behavior to applications, in order to save energy or increase performance. However, resilience research struggles to come up with useful abstract programming models for reasoning about faults in applications. This work mainly focuses on silent soft errors, that is, errors that do not cause the system to halt and provide no indication that they occurred. The approach presented is not specific to silent soft errors; it is a general model for tolerating abnormal behavior in numerical algorithms. We present findings targeted at silent faults that impact the data us...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
With the advent of exascale computing and the realization that memory errors will be an ever importa...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
Soft errors are faults which are not caused by defective hardware, rather they are induced due to no...
Large and inexpensive memory devices may suffer from faults, where some bits may arbitrarily flip an...
International audienceThis work focuses on resilience techniques at extreme scale. Many papers deal ...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
With the advent of exascale computing and the realization that memory errors will be an ever importa...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
We present a fault model designed to bring out the “worst” in iterative solvers based on mathematica...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
International audienceThe advent of extreme scale machines will require the use of parallel resource...
AbstractIn the multi-peta-flop era for supercomputers, the number of computing cores is growing expo...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
Soft errors are faults which are not caused by defective hardware, rather they are induced due to no...
Large and inexpensive memory devices may suffer from faults, where some bits may arbitrarily flip an...
International audienceThis work focuses on resilience techniques at extreme scale. Many papers deal ...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
With the advent of exascale computing and the realization that memory errors will be an ever importa...
Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's...