Coherence protocols consume an important fraction of power to determine which coherence action should take place. In this paper we focus on CMPs with a shared cache and a directory-based coherence protocol implemented as a duplicate of local caches tags. We observe that a big fraction of directory lookups produce a miss since the block looked up is not cached in any local cache. We propose to add a filter before the directory lookup in order to reduce the number of lookups to this structure. The filter identifies whether the current block was last accessed as a data or as an instruction. With this information, looking up the whole directory can be avoided for most accesses. We evaluate the filter in a CMP with 8 in-order processors with 4 t...
This paper presents a non-blocking directory-based cache coherence protocol to improve the performan...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
Although directory-based cache coherence protocols are the best choice when designing chip multiproc...
Abstract—Coherence protocols consume an important frac-tion of power to determine which coherence ac...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Future CMP designs that will integrate tens of processor cores on-chip will be constrained by area a...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching of...
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We a...
Abstract—This paper evaluates several techniques to save leakage in CMP L2 caches by selectively swi...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
This paper presents a non-blocking directory-based cache coherence protocol to improve the performan...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
Although directory-based cache coherence protocols are the best choice when designing chip multiproc...
Abstract—Coherence protocols consume an important frac-tion of power to determine which coherence ac...
Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Future CMP designs that will integrate tens of processor cores on-chip will be constrained by area a...
To support legacy software, large CMPs often provide cache coherence via an on-chip directory rathe...
This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching of...
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We a...
Abstract—This paper evaluates several techniques to save leakage in CMP L2 caches by selectively swi...
Todays systems are designed with Multi Core Architecture. The idea behind this is to achieve high sy...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
This paper presents a non-blocking directory-based cache coherence protocol to improve the performan...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
Although directory-based cache coherence protocols are the best choice when designing chip multiproc...