In this paper, we focus on the problem of recovering processor failures in shared memory multiprocessors. We propose an architecture designed for transparently tolerating processor failures. The recoverable shared memory (RSM) in the main component of this architecture which provides a hardware supported backward error recovery mechanism. This technique copes with standard caches and cache coherence protocols and avoids rollback propagation. The performance of the architecture during normal execution is evaluated and compared with that of existing fault tolerant shared memory multiprocessors. The performance study has been conducted by simulation using address traces collected from real parallel applications
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require un...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
This thesis examines memory management and rollback recovery in parallel architectures. Three memory...
In order to deploy a tightly-coupled multiprocessor (TCMP) in the commercial world, the TCMP must be...
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require un...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
International audienceDistributed Shared Memory (DSM) architectures are attractive to execute high p...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) hav...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
This thesis examines memory management and rollback recovery in parallel architectures. Three memory...
In order to deploy a tightly-coupled multiprocessor (TCMP) in the commercial world, the TCMP must be...
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require un...