International audienceThis work provides an optimal checkpointing strategy to protect iterative applications from fail-stop errors. We consider a general framework, where the application repeats the same execution pattern by executing consecutive iterations, and where each iteration is composed of several tasks. These tasks have different execution lengths and different checkpoint costs. Assume that there are n tasks and that task a i , where 0 ≤ i < n, has execution time t i and checkpoint cost c i. A naive strategy would checkpoint after each task. Another naive strategy would checkpoint at the end of each iteration. A strategy inspired by the Young/Daly formula would work for √ 2µcave seconds, where µ is the application MTBF and cave is ...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
This work provides an optimal checkpointing strategy to protect iterative applications from fail-sto...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enable...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
This work provides an optimal checkpointing strategy to protect iterative applications from fail-sto...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enable...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
International audienceThe Young/Daly formula for periodic checkpointing is known to hold for a divis...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables...