Checkpoint can store and recovery applications when faults happen and is becoming critical to large information systems. Unfortunately, existing checkpoint tools have some limitations such as: not transparent to applications, ignoring file system states, cluster checkpoint is not well supported, and so on. We present a light weight OS virtualization based cluster checkpoint. Firstly, a virtual container, IPG (Isolated Process Group), is designed to wrap all target applications together and produce checkpoint transparently and completely. Secondly, each IPG has its independent namespace built on an exclusively owned LV (Logical Volume), which can be checkpointed synchronously with the IPG’s memory to guarantee the consistency. Finally, distr...
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, app...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant inte...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
In a scientific community that increasingly relies upon High Performance Computing (HPC) for large s...
Nowadays, clusters are widely used to execute scientific applications. These applications are often ...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Abstract—Nowadays, clusters are widely used to execute scientific applications. These applications a...
Abstract — Nowadays, clusters are widely used to execute scientific applications. These applications...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant inte...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
Creating checkpoints of a distributed cluster operating system is a non-trivial task, as special coo...
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, app...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant inte...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
In a scientific community that increasingly relies upon High Performance Computing (HPC) for large s...
Nowadays, clusters are widely used to execute scientific applications. These applications are often ...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Abstract—Nowadays, clusters are widely used to execute scientific applications. These applications a...
Abstract — Nowadays, clusters are widely used to execute scientific applications. These applications...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant inte...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
Creating checkpoints of a distributed cluster operating system is a non-trivial task, as special coo...
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, app...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant inte...