With the advent of exascale computing and the realization that memory errors will be an ever important part of the high performance computing landscape, this paper proposes the reconsideration of stochastic linear solvers for their inherent scalability and resiliency capabilities. This paper addresses the latter by analyzing the resiliency of stochastic solvers to randomly occurring memory errors that go undetected. The premise is that, in stochastic solvers, undetected errors can be considered as part of the random process while detectable errors can be filtered using basic statistics. Thus, the goal is not to detect all memory errors, but only those that matter and quantifying their frequency which will impact efficiency. A simple iterati...
Abstract — Maintaining the reliability of integrated circuits as transistor sizes continue to shrink...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
Low-density Parity-check (LDPC) codes are very powerful linear error-correcting codes, first introdu...
With the advent of exascale computing and the realization that memory errors will be an ever importa...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
Large and inexpensive memory devices may suffer from faults, where some bits may arbitrarily flip an...
Abstract—As technology continues to scale down, the probability for hardware error to occur becomes ...
Today's nano-scale technology nodes are bringing reliability concerns back to the center stage of di...
Stochastic computing (SC) is an unconventional technique that has recently re-emerged as an attracti...
As device sizes shrink, device-level manufacturing challenges have led to increased variability in p...
Abstract—Mounting concerns over variability, defects, and noise motivate a new approach for digital ...
International audienceThis paper offers a review of recent developments in non-deterministic error c...
As traditional approaches for reducing power in microprocessors are being exhausted, extreme power c...
Abstract — Maintaining the reliability of integrated circuits as transistor sizes continue to shrink...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
Low-density Parity-check (LDPC) codes are very powerful linear error-correcting codes, first introdu...
With the advent of exascale computing and the realization that memory errors will be an ever importa...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
Large and inexpensive memory devices may suffer from faults, where some bits may arbitrarily flip an...
Abstract—As technology continues to scale down, the probability for hardware error to occur becomes ...
Today's nano-scale technology nodes are bringing reliability concerns back to the center stage of di...
Stochastic computing (SC) is an unconventional technique that has recently re-emerged as an attracti...
As device sizes shrink, device-level manufacturing challenges have led to increased variability in p...
Abstract—Mounting concerns over variability, defects, and noise motivate a new approach for digital ...
International audienceThis paper offers a review of recent developments in non-deterministic error c...
As traditional approaches for reducing power in microprocessors are being exhausted, extreme power c...
Abstract — Maintaining the reliability of integrated circuits as transistor sizes continue to shrink...
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE)...
Low-density Parity-check (LDPC) codes are very powerful linear error-correcting codes, first introdu...