International audienceDue to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if they must be used for long-running computations. In this paper, we show that the class of Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant SSMMs. A backward error recovery strategy can be implemented without significant hardware modification to previously proposed COMA by exploiting their standard replication mechanisms and extending the coherence protocol to transparently manage recovery data. Evaluation of the proposed fault-tolerant ...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
As microprocessors become faster and demand more bandwidth the already limited scalability of a shar...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
We present design details and some initial performance results of a novel scalable shared memory mu...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
As microprocessors become faster and demand more bandwidth the already limited scalability of a shar...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
We present design details and some initial performance results of a novel scalable shared memory mu...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
As microprocessors become faster and demand more bandwidth the already limited scalability of a shar...