Recent research shows that the high occupancy of Coherence Controllers (CCs) is a major performance bottleneck in scalable shared-memory multiprocessors. In this paper, we propose to take microarchitectural enhancements used for microprocessors and apply them to improve the throughput of hardwired CCs. These enhancements are CC support for nonblocking execution, early fetches of directory and L3 information, and superpipelining. Nonblocking execution in the CC reduces stalls by processing subsequent coherence transactions in the presence of misses in the directory cache and tag cache. Early fetching in the CC hides misses in the directory and tag caches and, therefore, also removes stalls. Finally, superpipelining in the CC increases its pr...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Recently there has been considerable interest in cache coherency protocols in shared-memory multipro...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Scalable distributed shared-memory architectures rely on coher-ence controllers on each processing n...
Abstract—Scalable distributed shared-memory architectures rely on coherence controllers on each proc...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
101 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2001.Applying a combination of the...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Inability to hide main memory latency has been increasingly limiting the performance of modern proce...
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue...
Abstract—Asymmetric coherency is a new concept to support non-uniform workloads in multicore process...
System-on-a-chip (SoC) designs is characterized by heavy reuse of IP blocks to satisfy specific comp...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Recently there has been considerable interest in cache coherency protocols in shared-memory multipro...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Scalable distributed shared-memory architectures rely on coher-ence controllers on each processing n...
Abstract—Scalable distributed shared-memory architectures rely on coherence controllers on each proc...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
101 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2001.Applying a combination of the...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Inability to hide main memory latency has been increasingly limiting the performance of modern proce...
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue...
Abstract—Asymmetric coherency is a new concept to support non-uniform workloads in multicore process...
System-on-a-chip (SoC) designs is characterized by heavy reuse of IP blocks to satisfy specific comp...
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing la...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Recently there has been considerable interest in cache coherency protocols in shared-memory multipro...
We introduce an architectural approach to improve memory system performance in both uniprocessor and...