International audienceCheckpointing is a classical technique to mitigate the overhead of adjoint Al-gorithmic Differentiation (AD). In the context of source transformation AD with the Store-All approach, checkpointing reduces the peak memory consumption of the adjoint, at the cost of duplicate runs of selected pieces of the code. Checkpointing is vital for long run-time codes, which is the case of most MPI parallel applications. However, the presence of MPI communications seriously restricts application of checkpointing. In most attempts to apply checkpointing to adjoint MPI codes (the " popular " approach), a number of restrictions apply on the form of communications that occur in the checkpointed piece of code. In many works, these restri...
Distributed systems are often developed using the message passing paradigm, where the only way to...
Algorithmic Differentiation (AD) is a set of techniques to calculate derivatives of a computer progr...
International audienceThe high failure rate expected for future supercomputers requires the design o...
Checkpointing is a classical technique to mitigate the overhead of adjoint Algorithmic Differentiati...
International audienceCheckpointing is a classical strategy to reduce the peak memory consumption of...
International audienceCollective MPI communications have to be executed in the same order by all pro...
International audience— As reported by many recent studies, the mean time between failures of future...
International audienceNowadays most scientific applications are parallelized based on MPI communicat...
This work presents experience with traditional use cases of checkpointing on a novel platform. A sin...
International audienceA long-term trend in high-performance computing is the increasing number of no...
The Message Passing Interface (MPI) is the standard API for parallelization in high-performance and ...
International audienceMPI-3 provide functions for non-blocking collectives. To help programmers intr...
International audienceWe reexamine the work of Stumm and Walther on multistage algorithms for adjoin...
International audienceReplication has recently gained attention in the context of fault tolerance fo...
International audienceErrors have become a critical problem for high performance computing. Checkpoi...
Distributed systems are often developed using the message passing paradigm, where the only way to...
Algorithmic Differentiation (AD) is a set of techniques to calculate derivatives of a computer progr...
International audienceThe high failure rate expected for future supercomputers requires the design o...
Checkpointing is a classical technique to mitigate the overhead of adjoint Algorithmic Differentiati...
International audienceCheckpointing is a classical strategy to reduce the peak memory consumption of...
International audienceCollective MPI communications have to be executed in the same order by all pro...
International audience— As reported by many recent studies, the mean time between failures of future...
International audienceNowadays most scientific applications are parallelized based on MPI communicat...
This work presents experience with traditional use cases of checkpointing on a novel platform. A sin...
International audienceA long-term trend in high-performance computing is the increasing number of no...
The Message Passing Interface (MPI) is the standard API for parallelization in high-performance and ...
International audienceMPI-3 provide functions for non-blocking collectives. To help programmers intr...
International audienceWe reexamine the work of Stumm and Walther on multistage algorithms for adjoin...
International audienceReplication has recently gained attention in the context of fault tolerance fo...
International audienceErrors have become a critical problem for high performance computing. Checkpoi...
Distributed systems are often developed using the message passing paradigm, where the only way to...
Algorithmic Differentiation (AD) is a set of techniques to calculate derivatives of a computer progr...
International audienceThe high failure rate expected for future supercomputers requires the design o...