Abstract: Software cache coherence schemes are very desirable in the design of scalable multiprocessors and massively parallel processors. The authors propose a software cache coherence scheme named ‘delayed precise invalidation’ (DPI). DPI is based on compiler-time markings of references and a hardware local invalidation of stale data in parallel and selectively. With a small amount of additional hardware and a small set of cache management instructions, this scheme provides more cacheability and allows invalidation of partial elements in an array, overcoming some inefficiencies and deficiencies of previous software cache coherence schemes.
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
This document describes a set of new techniques for improving the efficiency of compiler-directed so...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions ...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
Cache coherence protocols play an important role in the performance of distributed and centralized s...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Abstract—Directory-based cache coherence is a popular mechanism for chip multiprocessors and multico...
In large scale machines, thousands of processor cycles, in other words, missed opportunities to issu...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
To reduce overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a selfin...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
This document describes a set of new techniques for improving the efficiency of compiler-directed so...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions ...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
Cache coherence protocols play an important role in the performance of distributed and centralized s...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Abstract—Directory-based cache coherence is a popular mechanism for chip multiprocessors and multico...
In large scale machines, thousands of processor cycles, in other words, missed opportunities to issu...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
To reduce overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a selfin...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
This document describes a set of new techniques for improving the efficiency of compiler-directed so...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...