The contributions of this paper are the following. We describe the implementation of the $C^3$ system for semi-automatic application-level checkpointing of C programs. The system has (i) a pre-compiler that instruments C programs so that they can save their states at program execution points specified by the user, and (ii) a novel memory allocator that manages the heap as a collection of pools. We describe two static analyses for reducing the overhead of saving and restoring the application state. The first one optimizes stack variables, while the second one optimizes heap data structures. To benchmark our system, we compare the overheads introduced by our semi-automatic approach with the overhead of handwritten application-level checkpoin...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
With the evolution of high-performance computing towards heterogeneous, massively par-allel systems,...
Trends in high-performance computing are making it nec-essary for long-running applications to toler...
The contributions of this paper are the following. • We describe the implementation of the C3 system...
The contributions of this paper are the following. • We describe the implementation of the C3 system...
Abstract. As modern supercomputing systems reach the peta-flop performance range, they grow in both ...
In this paper we present compiler-assisted checkpointing, a new technique which uses static program ...
This thesis examines the feasibility of applying compile-time information to assist in rollback reco...
Checkpointing support allows program execution to roll-back to an earlier program point, discarding ...
Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this,...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
As modern supercomputing systems reach the peta-flop perfor-mance range, they grow in both size and ...
Abstract. As modern supercomputing systems reach the peta-flop per-formance range, they grow in both...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
With the evolution of high-performance computing towards heterogeneous, massively par-allel systems,...
Trends in high-performance computing are making it nec-essary for long-running applications to toler...
The contributions of this paper are the following. • We describe the implementation of the C3 system...
The contributions of this paper are the following. • We describe the implementation of the C3 system...
Abstract. As modern supercomputing systems reach the peta-flop performance range, they grow in both ...
In this paper we present compiler-assisted checkpointing, a new technique which uses static program ...
This thesis examines the feasibility of applying compile-time information to assist in rollback reco...
Checkpointing support allows program execution to roll-back to an earlier program point, discarding ...
Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this,...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
As modern supercomputing systems reach the peta-flop perfor-mance range, they grow in both size and ...
Abstract. As modern supercomputing systems reach the peta-flop per-formance range, they grow in both...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
With the evolution of high-performance computing towards heterogeneous, massively par-allel systems,...
Trends in high-performance computing are making it nec-essary for long-running applications to toler...