Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol...
Abstract—A solution adopted in the past to design high perfor-mance multiprocessors systems that wer...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
[[abstract]]Cache depot is a performance enhancement technique on cache-coherent non-uniform memory ...
Abstract—A solution adopted in the past to design high perfor-mance multiprocessors systems that wer...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
[[abstract]]Cache depot is a performance enhancement technique on cache-coherent non-uniform memory ...
Abstract—A solution adopted in the past to design high perfor-mance multiprocessors systems that wer...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The paper introduces Network-on-Chip (NoC) design methodology and low cost mechanisms for supporting...