To support legacy software, large CMPs often provide cache coherence via an on-chip directory rather than snooping. In those designs, a key challenge is maximizing the effectiveness of precious on-chip directory state. Most current directory protocols miss an opportunity by organizing all state in per-block records. To increase the "reach" of on-chip directory state, we apply ideas from snooping region coherence to develop a dual-grain CMP directory protocol. First, we trade enable a tradeoff between unnecessary probes (e.g., invalidations) and on-chip directory storage size by organizing a directory entry with both per-1KB-region state and per-64B-block state. Second, to optimize for sparsely accessed regions, we evaluate an asymme...
We present a new arrangement of directory bits called the segment directory to improve directory sto...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
As shown in some prior studies, a significant percentage of data blocks accessed in parallel codes a...
Conventional directory coherence operates at the finest granularity possible, that of a cache block....
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Future CMP designs that will integrate tens of processor cores on-chip will be constrained by area a...
Although directory-based cache coherence protocols are the best choice when designing chip multiproc...
Coherence protocols consume an important fraction of power to determine which coherence action shoul...
Chip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-...
Abstract—Coherence protocols consume an important frac-tion of power to determine which coherence ac...
Cataloged from PDF version of article.Thesis (M.S.): Bilkent University, Department of Computer Engi...
Abstract. If current trends continue, today’s small-scale general-purpose CMPs will soon be replaced...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increas...
We present a new arrangement of directory bits called the segment directory to improve directory sto...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
As shown in some prior studies, a significant percentage of data blocks accessed in parallel codes a...
Conventional directory coherence operates at the finest granularity possible, that of a cache block....
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Future CMP designs that will integrate tens of processor cores on-chip will be constrained by area a...
Although directory-based cache coherence protocols are the best choice when designing chip multiproc...
Coherence protocols consume an important fraction of power to determine which coherence action shoul...
Chip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-...
Abstract—Coherence protocols consume an important frac-tion of power to determine which coherence ac...
Cataloged from PDF version of article.Thesis (M.S.): Bilkent University, Department of Computer Engi...
Abstract. If current trends continue, today’s small-scale general-purpose CMPs will soon be replaced...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increas...
We present a new arrangement of directory bits called the segment directory to improve directory sto...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
As shown in some prior studies, a significant percentage of data blocks accessed in parallel codes a...