Numerous algorithms for computationally intensive tasks that are suitable for execution on hypercube multiprocessors have been developed by researchers. In this thesis, we look at parallel algorithm design from a different perspective: the provision of on-line detection of hardware errors using software techniques without any hardware modifications. This approach is called Algorithm-based error detection. We report on the implementation of system-level error detection mechanisms for four parallel applications on a 16-processor Intel iPSC-2/D4/MX hypercube multiprocessor: (1) matrix multiplication; (2) Fast Fourier Transform; (3) QR factorization; (4) singular value decomposition. We describe extensive studies of the error coverage of our sy...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
Error-detecting algorithms can determine when, at run time, a program deviates from its expected beh...
There is broad consensus among academic and industrial researchers in computer architecture that har...
National Science Foundation / NSF MIP 86-57563U of I OnlyRestricted to UIUC communit
National Science Foundation / NSF MIP 86-57563U of I OnlyRestricted to UIUC communit
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
Microprocessor error detection is increasingly important, as the number of transistors in modern sys...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
Error-detecting algorithms can determine when, at run time, a program deviates from its expected beh...
There is broad consensus among academic and industrial researchers in computer architecture that har...
National Science Foundation / NSF MIP 86-57563U of I OnlyRestricted to UIUC communit
National Science Foundation / NSF MIP 86-57563U of I OnlyRestricted to UIUC communit
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
Microprocessor error detection is increasingly important, as the number of transistors in modern sys...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
Error-detecting algorithms can determine when, at run time, a program deviates from its expected beh...
There is broad consensus among academic and industrial researchers in computer architecture that har...