A split-cache memory architecture is presented which provides efficient architectural support for checkpointing and roll-forward recovery mechanisms in distributed systems. Unlike existing techniques, the approach does not require the use of a discrete stable storage unit or explicit synchronization among the processors. A nonscheduled checkpointing mechanism is presented based on a cache-line replacement policy, instead of the conventionally used periodic checkpoint establishment protocols
Relaxed memory consistency models tolerate increased memory access latency in both hardware and soft...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
No cache based techniques for roll-forward fault recovery exist at present. A split-cache approach i...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
This thesis examines memory management and rollback recovery in parallel architectures. Three memory...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require un...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
This paper proposes an approach for adding fault tolerance, based on consistent checkpointing, to di...
In this work, a new roll-forward check pointing scheme is proposed using basic checkpoints. The dir...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
This thesis studies a forward recovery strategy using checkpointing and optimistic execution in para...
Relaxed memory consistency models tolerate increased memory access latency in both hardware and soft...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
No cache based techniques for roll-forward fault recovery exist at present. A split-cache approach i...
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implem...
This thesis examines memory management and rollback recovery in parallel architectures. Three memory...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require un...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
This paper proposes an approach for adding fault tolerance, based on consistent checkpointing, to di...
In this work, a new roll-forward check pointing scheme is proposed using basic checkpoints. The dir...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
This thesis studies a forward recovery strategy using checkpointing and optimistic execution in para...
Relaxed memory consistency models tolerate increased memory access latency in both hardware and soft...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...