Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only fr...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Communicated by Hiroyuki Sato The program monitoring and control mechanisms of virtualization tools ...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
Fault tolerance in cloud computing is considered as one of the most vital issues to deliver reliable...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Checkpoint can store and recovery applications when faults happen and is becoming critical to large ...
207 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.This thesis presents research...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Communicated by Hiroyuki Sato The program monitoring and control mechanisms of virtualization tools ...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
Abstract- In this work, we present the design of the Checkpointing-Enabled Virtual Machine (CEVM) ar...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
Fault tolerance in cloud computing is considered as one of the most vital issues to deliver reliable...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Checkpoint can store and recovery applications when faults happen and is becoming critical to large ...
207 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.This thesis presents research...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Embedded real-time virtualized systems serve a wide range of functions in many industries. They can ...
Communicated by Hiroyuki Sato The program monitoring and control mechanisms of virtualization tools ...