Fault Tolerance in MPI Programs

Ewing Lusk

Publication date

January 2002

Abstract

Abstract. This paper examines the topic of writing fault-tolerant MPI applications. We discuss the meaning of fault tolerance in general and what the MPI Standard has to say about it. We survey several approaches to this problem, namely checkpointing, restructuring a class of standard MPI programs, modifying MPI semantics, and extending the MPI specification. We conclude that within certain constraints, MPI can provide a useful context for writing application programs that exhibit significant degrees of fault tolerance.

Extracted data

We use cookies to provide a better user experience.

Data Protection

Fault Tolerance in MPI Programs

Abstract

Extracted data

Fault Tolerance in MPI Programs

Abstract

Extracted data

Related items

Related items