Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared memory multiprocessors are nonuniform memory access machines with private caches and a cache coherence protocol. Proposed hardware optimizations to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each of four proposed optimizations: release consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a write cache. The four optimizations differ with respect to which application features they attack, what hardware resources they require, and wh...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
technical reportThe next generation of scalable parallel systems (e.g., machines by KSR, Convex, and...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in...
The memory consistency model of a shared-memory multiprocessor determines the extent to which memory...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
The most commonly assumed memory consistency model for shared-memory multiprocessors is Sequential C...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
During the last few years many different memory consistency protocols have been proposed. These rang...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
We present design details and some initial performance results of a novel scalable shared memory mul...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
technical reportThe next generation of scalable parallel systems (e.g., machines by KSR, Convex, and...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in...
The memory consistency model of a shared-memory multiprocessor determines the extent to which memory...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
The most commonly assumed memory consistency model for shared-memory multiprocessors is Sequential C...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
During the last few years many different memory consistency protocols have been proposed. These rang...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
We present design details and some initial performance results of a novel scalable shared memory mul...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
technical reportThe next generation of scalable parallel systems (e.g., machines by KSR, Convex, and...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...