Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the e...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
MapReduce is a framework for processing large data sets much used in the context of cloud computing....
MapReduce is a popular distributed data-processing system for analyzing big data in cloud environmen...
Even though the cloud platform promises to be reliable, several availability incidents prove that th...
Even though the cloud platform promises to be reliable, several availability incidents prove that it...
AbstractThe increasing failure rate in High Performance Computing encourages the investigation of fa...
The increasing failure rate in High Performance Computing encourages the investigation of fault tole...
Fault tolerance has become an important issue for parallel applications in the last few years. The p...
International audienceBecause e-Science applications are data intensive and require long execution r...
The demand for computational power has been leading the improvement of the High Performance Computin...
International audienceTwo areas are currently the focus of active research, namely cloud computing a...
Cloud computing is increasingly attracting huge attention both in academic research and industry ini...
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciê...
In High Performance Computing (HPC) the demand for more performance is satisfied by increasing the n...
Cloud Computing offers the possibility of computing resources, allowing remote access to software, s...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
MapReduce is a framework for processing large data sets much used in the context of cloud computing....
MapReduce is a popular distributed data-processing system for analyzing big data in cloud environmen...
Even though the cloud platform promises to be reliable, several availability incidents prove that th...
Even though the cloud platform promises to be reliable, several availability incidents prove that it...
AbstractThe increasing failure rate in High Performance Computing encourages the investigation of fa...
The increasing failure rate in High Performance Computing encourages the investigation of fault tole...
Fault tolerance has become an important issue for parallel applications in the last few years. The p...
International audienceBecause e-Science applications are data intensive and require long execution r...
The demand for computational power has been leading the improvement of the High Performance Computin...
International audienceTwo areas are currently the focus of active research, namely cloud computing a...
Cloud computing is increasingly attracting huge attention both in academic research and industry ini...
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciê...
In High Performance Computing (HPC) the demand for more performance is satisfied by increasing the n...
Cloud Computing offers the possibility of computing resources, allowing remote access to software, s...
Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of...
MapReduce is a framework for processing large data sets much used in the context of cloud computing....
MapReduce is a popular distributed data-processing system for analyzing big data in cloud environmen...