Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform comprising 288 cores through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in t...
This paper presents a unique virtual memory page management scheme for loosely coupled CCNUMA platfo...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
[[abstract]]Cache depot is a performance enhancement technique on cache-coherent non-uniform memory ...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Abstract—A solution adopted in the past to design high perfor-mance multiprocessors systems that wer...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
This paper presents a unique virtual memory page management scheme for loosely coupled CCNUMA platfo...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
[[abstract]]Cache depot is a performance enhancement technique on cache-coherent non-uniform memory ...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Abstract—A solution adopted in the past to design high perfor-mance multiprocessors systems that wer...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
This paper presents a unique virtual memory page management scheme for loosely coupled CCNUMA platfo...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP),...