It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory. We define a region to be a continuous, aligned memory area whose size is a power of two and observe that many requests find that no other node caches a block in the same region even for regions as large as 16K bytes. We propose RegionScout, a family of simple filter mechanisms that dynamically detect most non-shared regions. A node with a RegionScout filter can determine in advance that a request will miss in all remote nodes. RegionScout filters are implemented as a layered extension over existing snoop-based coherence systems. They require no c...
The coherence protocol is a first-order design concern in multicore designs. Directory protocols are...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2013.Chip multiprocessors conti...
Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism a...
With transistor miniaturization leading to an abundance of on-chip resources and uniprocessor design...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
Multicore systems have reached a stage where they are inevitable in the embedded world. This transit...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Hiding memory latency is critical in modern machines. Typically, machines have used cache and addres...
Coherence protocols consume an important fraction of power to determine which coherence action shoul...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
We present a novel methodology for power reduction in embedded multiprocessor systems. Maintaining l...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The coherence protocol is a first-order design concern in multicore designs. Directory protocols are...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2013.Chip multiprocessors conti...
Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism a...
With transistor miniaturization leading to an abundance of on-chip resources and uniprocessor design...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
Multicore systems have reached a stage where they are inevitable in the embedded world. This transit...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Hiding memory latency is critical in modern machines. Typically, machines have used cache and addres...
Coherence protocols consume an important fraction of power to determine which coherence action shoul...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
We present a novel methodology for power reduction in embedded multiprocessor systems. Maintaining l...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The coherence protocol is a first-order design concern in multicore designs. Directory protocols are...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2013.Chip multiprocessors conti...