Compiler-assisted staggered checkpointing

Norman, Alison Nicholas

Publication date

November 2010

Abstract

textTo make progress in the face of failures, long-running parallel applications need to save their state, known as a checkpoint. Unfortunately, current checkpointing techniques are becoming untenable on large-scale supercomputers. Many applications checkpoint all processes simultaneously--a technique that is easy to implement but often saturates the network and file system, causing a significant increase in checkpoint overhead. This thesis introduces compiler-assisted staggered checkpointing, where processes checkpoint at different places in the application text, thereby reducing contention for the network and file system. This checkpointing technique is algorithmically challenging since the number of possible solutions is enormous and the...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Compiler-assisted staggered checkpointing

Abstract

Extracted data

Compiler-assisted staggered checkpointing

Abstract

Extracted data

Related items

Related items