International audienceDue to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if they must be used for long-running computations. In this paper, we show that the class of Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant SSMMs. A backward error recovery strategy can be implemented without significant hardware modification to previously proposed COMA by exploiting their standard replication mechanisms and extending the coherence protocol to transparently manage recovery data. Evaluation of the proposed fault-tolerant ...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
We present design details and some initial performance results of a novel scalable shared memory mul...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
As microprocessors become faster and demand more bandwidth the already limited scalability of a shar...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
We present design details and some initial performance results of a novel scalable shared memory mul...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
As microprocessors become faster and demand more bandwidth the already limited scalability of a shar...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
The concept of backward recovery is now well established as a means of restoring a consistent state ...