Fault– tolerant high–performance matrix multiplication: Theory and practice

John A. Gunnels
Daniel S. Katz
Enrique S. Quintana–ort́ı
Robert A. Van De Geijn

Publication date

January 2001

Abstract

In this paper, we extend the theory of algorithmic fault-tolerant matrix-matrix mul-tiplication, C = AB, in a number of ways. First, we propose low-overhead methods for detecting errors introduced not only in C but also in A and/or B. Second, we theoretically show that the methods will detect all errors as long as only one entry is corrupted. Third, we propose a low-overhead rollback approach to correct errors once detected. Finally, we give a high-performance implementation of matrix-matrix mul-tiplication that incorporates these error detection and correction methods. Empirical results demonstrate that the methods work well in practice with an acceptable level of overhead relative to high-performance implementations without fault-toleranc...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Fault– tolerant high–performance matrix multiplication: Theory and practice

Abstract

Extracted data

Fault– tolerant high–performance matrix multiplication: Theory and practice

Abstract

Extracted data

Related items

Related items