This short paper deals with parallel scientific applications using non-blocking and periodic co-ordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
International audienceThis short paper deals with parallel scientific applications using non-blockin...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
Since the last decade, computing systems turn to large scale parallel platforms composed of thousand...
We study programs which operate in the presence of possible failures and which must be restarted fro...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
Long-running software may operate on hardware platforms with limited energy resources such as batter...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
International audienceThis short paper deals with parallel scientific applications using non-blockin...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
This short paper deals with parallel scientific applications using non-blocking and periodic coordin...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
Since the last decade, computing systems turn to large scale parallel platforms composed of thousand...
We study programs which operate in the presence of possible failures and which must be restarted fro...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
Long-running software may operate on hardware platforms with limited energy resources such as batter...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...