The concept of backward recovery is now well established as a means of restoring a consistent state of a fault tolerant system should some faults occur. In this paper, we consider a system of communicating processes mapped onto a multilevel execution support. A shared memory multiprocessor machine is assumed. Our interest is in tolerating the hardware faults that may occur during the execution of a concurrent computation. The machine provides a hardware backard recovery protocol based on a specialized memory device which tracks dependencies between the processors accessing shared data residing in memory. The transparency provided by the protocol is discussed considering successively the models of computation at the various levels of abstrac...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
In order to deploy a tightly-coupled multiprocessor (TCMP) in the commercial world, the TCMP must be...
Fault tolerance in distributed shared memory through replication has yet to be explored. This resear...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Backward error recovery involving checkpointing and restart of tasks is an important component of an...
IRISA - Publication interne no 647, 40 p., mars 1992SIGLEAvailable at INIST (FR), Document Supply Se...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
The scale of parallel computing systems is rapidly approaching dimensions where fault tolerance can...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpo...
Due to the character of the original source materials and the nature of batch digitization, quality ...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
In order to deploy a tightly-coupled multiprocessor (TCMP) in the commercial world, the TCMP must be...
Fault tolerance in distributed shared memory through replication has yet to be explored. This resear...
The concept of backward recovery is now well established as a means of restoring a consistent state ...
In this paper, we focus on the problem of recovering processor failures in shared memory multiproces...
International audienceScalable shared memory multiprocessors are promising architectures to achieve ...
This thesis focuses on the issue of reliability and fault tolerance in Distributed Shared Memory Mul...
Backward error recovery involving checkpointing and restart of tasks is an important component of an...
IRISA - Publication interne no 647, 40 p., mars 1992SIGLEAvailable at INIST (FR), Document Supply Se...
Traditionally, tightly coupled multiprocessors allow data sharing between multiple caches by keeping...
International audienceDue to the increasing number of their components, Scalable Shared Memory Multi...
The scale of parallel computing systems is rapidly approaching dimensions where fault tolerance can...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpo...
Due to the character of the original source materials and the nature of batch digitization, quality ...
: COMAs (Cache Only Memory Architectures) are an interesting class of large scale shared memory mult...
In order to deploy a tightly-coupled multiprocessor (TCMP) in the commercial world, the TCMP must be...
Fault tolerance in distributed shared memory through replication has yet to be explored. This resear...