[[abstract]]Cache depot is a performance enhancement technique on cache-coherent non-uniform memory access (CC-NUMA) multiprocessors, in which nodes in the system store extra memory blocks on behalf of other nodes. In this way, memory requests from a node can be satisfied by nearby depot nodes without going all the way to the home node. This not only reduces memory access latency and network traffic, but also spreads the network load more evenly. In this paper, we study the design strategy for cache depot that (1) enhances the network interface of each node to include a depot cache which stores those extra memory blocks for other nodes, and (2) employs a new multicast routing scheme, which is called the multi-hop worms and works cooperative...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
Abstract—As Internet and information technology have continued developing, the necessity for fast pa...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...
[[abstract]]Rapid advances in interconnection networks in multiprocessors are closing the gap betwee...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
In this paper, performance of multistage interconnection network with wormhole routing and packet sw...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Non-uniform cache architectures (NUCAs) are a novel design paradigm for large last-level on-chip cac...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Multi-core architectures are the future for high-performance computing and are omnipresent these day...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
Abstract—As Internet and information technology have continued developing, the necessity for fast pa...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...
[[abstract]]Rapid advances in interconnection networks in multiprocessors are closing the gap betwee...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
In this paper, performance of multistage interconnection network with wormhole routing and packet sw...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Non-uniform cache architectures (NUCAs) are a novel design paradigm for large last-level on-chip cac...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Multi-core architectures are the future for high-performance computing and are omnipresent these day...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
Abstract—As Internet and information technology have continued developing, the necessity for fast pa...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...