FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Herault, Thomas
Hoarau, William
Lemarinier, Pierre
Rodriguez, Eric
Tixeuil, Sébastien

Publication date

May 2006

Publisher

HAL CCSD

Abstract

One of the topics of paramount importance in the development of Cluster and Grid middleware is the impact of faults since their occurrence probability in a Grid infrastructure and in large-scale distributed system is actually very high. MPI (Message Passing Interface) is a popular abstraction for programming distributed computation applications. FAIL is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing

Extracted data

We use cookies to provide a better user experience.

Data Protection

FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Abstract

Extracted data

FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

Abstract

Extracted data

Related items

Related items