A comparison of several fault-tolerance methods for the detection and correction of floating-point errors in matrix-matrix multiplication

Le Fèvre, Valentin
Herault, Thomas
Langou, Julien
Robert, Yves

Publication date

August 2020

Publisher

HAL CCSD

Abstract

International audienceThis paper compares several fault-tolerance methods for the detection and correction of floating-point errors in matrix-matrix multiplication. These methods include replication, triplication, Algorithm-Based Fault Tolerance (ABFT) and residual checking (RC). Error correction for ABFT can be achieved either by solving a small-size linear system of equations, or by recomputing corrupted coefficients. We show that both approaches can be used for RC. We provide a synthetic presentation of all methods before discussing their pros and cons. We have implemented all these methods with calls to optimized BLAS routines, and we provide performance data for a wide range of failure rates and matrix sizes

Extracted data

We use cookies to provide a better user experience.

Data Protection

A comparison of several fault-tolerance methods for the detection and correction of floating-point errors in matrix-matrix multiplication

Abstract

Extracted data

A comparison of several fault-tolerance methods for the detection and correction of floating-point errors in matrix-matrix multiplication

Abstract

Extracted data

Related items

Related items