International audienceWe provide a framework to analyze multi-level checkpointing protocols, by formally defining a k-level checkpointing pattern. We provide a first-order approximation to the optimal checkpointing period, and show that the corresponding overhead is in the order of k =1 √ 2λ C , where λ is the error rate at level , and C the checkpointing cost at level. This nicely extends the classical Young/Daly formula on single-level checkpointing. Furthermore, we are able to fully characterize the shape of the optimal pattern (number and positions of checkpoints), and we provide a dynamic programming algorithm to determine the optimal subset of levels to be used. Finally, we perform simulations to check the accuracy of the theoretical ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
Checkpointing is an effective fault-tolerant technique for improving system availability and reliabi...
This work provides an optimal checkpointing strategy to protect iterative applications from fail-sto...
Checkpointing is commonly adopted for enhancing the performance of software applications that operat...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
International audienceWe provide a framework to analyze multi-level checkpointing protocols, by form...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
We provide a framework to analyze multi-level checkpointing protocols, by formally defininga $k$-lev...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
Checkpointing is an effective fault-tolerant technique for improving system availability and reliabi...
This work provides an optimal checkpointing strategy to protect iterative applications from fail-sto...
Checkpointing is commonly adopted for enhancing the performance of software applications that operat...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...
This paper revisits checkpointing strategies when workflows composed of multiple tasks execute on a ...