Distributed Shared Memory (DSM) architectures are attractive to execute high performance parallel applications. Made up of a large number of components, these architectures have however a high probability of failure. We propose a protocol to tolerate node failures in two classes of DSM architectures: Cache Only Memory Architectures (COMA) and Distributed Virtual Shared Memory (SVM) systems. The proposed solution is based on backward error recovery and consists of an extension to the existing coherence protocols to manage data used by processors for the computation and recovery data, used for fault tolerance. The implementation of the protocol in a COMA architecture has been evaluated by simulation. The protocol has also been implemented in ...
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be eff...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Distributed Shared Memory (DSM) systems combine the ease of programming of Shared Memory Parallel Co...
Distributed Shared Memory (DSM) architectures are attractive to execute high performance parallel ap...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Distributed Shared Memory (DSM) systems are becoming increasingly more significant as a result of be...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
Backward error recovery involving checkpointing and restart of tasks is an important component of an...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
With the advent of large networks and the demand to have uninterrupted service, there is a pressing ...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be eff...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Distributed Shared Memory (DSM) systems combine the ease of programming of Shared Memory Parallel Co...
Distributed Shared Memory (DSM) architectures are attractive to execute high performance parallel ap...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Distributed Shared Memory (DSM) systems are becoming increasingly more significant as a result of be...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
Backward error recovery involving checkpointing and restart of tasks is an important component of an...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
With the advent of large networks and the demand to have uninterrupted service, there is a pressing ...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be eff...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Distributed Shared Memory (DSM) systems combine the ease of programming of Shared Memory Parallel Co...