Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Fault tolerance in cloud computing is considered as one of the most vital issues to deliver reliable...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
207 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.This thesis presents research...
Checkpoint can store and recovery applications when faults happen and is becoming critical to large ...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Fault tolerance in cloud computing is considered as one of the most vital issues to deliver reliable...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
207 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.This thesis presents research...
Checkpoint can store and recovery applications when faults happen and is becoming critical to large ...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...