International audienceInput/output (I/O) from various sources often contend for scarcely available bandwidth. For example, checkpoint/restart (CR) protocols can help to ensure application progress in failure-prone environments. However, CR I/O alongside an application's normal, requisite I/O can increase I/O contention and might negatively impact performance. In this work, we consider different aspects (system-level scheduling policies and hardware) that optimize the overall performance of concurrently executing CR-based applications that share I/O resources. We provide a theoretical model and derive a set of necessary constraints to minimize the global waste on a given platform. Our results demonstrate that Young/Daly's optimal checkpoint ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
International audienceEfficient checkpointing of distributed data structures periodically at key mom...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
International audienceEfficient checkpointing of distributed data structures periodically at key mom...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
International audienceIn high-performance computing environments, in-put/output (I/O) from various s...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
In high-performance computing environments, input/output (I/O) from varioussources often contend for...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
International audienceEfficient checkpointing of distributed data structures periodically at key mom...