Grid infrastructure is a large set of nodes geographically distributed and connected by a communication. In this context, fault tolerance is a necessity imposed by the distribution that poses a number of problems related to the heterogeneity of hardware, operating systems, networks, middleware, applications, the dynamic resource, the scalability, the lack of common memory, the lack of a common clock, the asynchronous communication between processes. To improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resistance to these faults of the system. Fault tolerance is intended to allow the system to provide service as specified in spite of occurrences of faults. It app...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
International audienceGrid infrastructure is a large set of nodes geographically distributed and con...
International audienceGrid infrastructure is a large set of nodes geographically distributed and con...
Grid of computing nodes has emerged as a representative means of connecting distributed computers or...
International audienceAs high performance platforms (Clusters, Grids, etc.) continue to grow in size...
International audienceGrid computing mutualizes more computing resources working in a calculation or...
Abstract—The GridRPC model is well suited for high per-formance computing on grids thanks to efficie...
Performance evaluation of checkpoint rollback recovery strategies for distributed systems is a field...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Due to the character of the original source materials and the nature of batch digitization, quality ...
International audienceThe EU-funded XtreemOS project implements an open-source grid operating system...
Also available as an INRIA Research Report 5091: http://www.inria.fr/rrrt/rr-5091.htmlA new kind of ...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
International audienceGrid infrastructure is a large set of nodes geographically distributed and con...
International audienceGrid infrastructure is a large set of nodes geographically distributed and con...
Grid of computing nodes has emerged as a representative means of connecting distributed computers or...
International audienceAs high performance platforms (Clusters, Grids, etc.) continue to grow in size...
International audienceGrid computing mutualizes more computing resources working in a calculation or...
Abstract—The GridRPC model is well suited for high per-formance computing on grids thanks to efficie...
Performance evaluation of checkpoint rollback recovery strategies for distributed systems is a field...
International audienceAs High Performance platforms (Clusters, Grids, etc.) continue to grow in size...
Due to the character of the original source materials and the nature of batch digitization, quality ...
International audienceThe EU-funded XtreemOS project implements an open-source grid operating system...
Also available as an INRIA Research Report 5091: http://www.inria.fr/rrrt/rr-5091.htmlA new kind of ...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...