Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often accessed only by a single core. We take advantage of this behavior and investigate reducing the coherence directory size by tracking coherence at multiple different granularities. We show that such a Multi-grain Directory (MGD) can significantly reduce the required number of directory entries across a variety of different workloads. Our analysis shows a simple dual-grain directory (DGD) obtains the majority of the benefit while tracking individual cache blocks and coarse-grain regions of 1KB...
Chip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
Today’s multicore chips commonly implement shared memory with cache coherence as low-level support f...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2013.Chip multiprocessors conti...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
Cache coherence problem is a major concern in the design of shared-memory multiprocessors. As the nu...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increas...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
As computing power has increased over the past few decades, science and engineering have found more ...
Chip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
Today’s multicore chips commonly implement shared memory with cache coherence as low-level support f...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2013.Chip multiprocessors conti...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
Cache coherence problem is a major concern in the design of shared-memory multiprocessors. As the nu...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increas...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
As computing power has increased over the past few decades, science and engineering have found more ...
Chip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
Today’s multicore chips commonly implement shared memory with cache coherence as low-level support f...