Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from both academia and industries. This paper studies the performance impact of design choices at different levels of address and memory mapping on CC-NUMA architectures. Through execution-driven simulations of five numerical programs, we find close interactions between data allocation, global address translation and cache set-addressing. Our results strongly discourage the use of direct-mapped caches in CC-NUMA machines. Results also show that data allocation often makes a great impact on memory miss ratio and execution time. A compiler scheme which allocates data and parallel tasks simultaneously is shown to perform quite well consistently. Ke...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
The choice of a good data distribution scheme is critical to performance of data-parallel applicatio...
Abstract—An important aspect of workload characterization is understanding memory system performance...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that...
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel p...
There are three major classes of MIMD multiprocessors: cache-coherent machines, NUMA (non-uniform me...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
The choice of a good data distribution scheme is critical to performance of data-parallel applicatio...
Abstract—An important aspect of workload characterization is understanding memory system performance...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that...
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel p...
There are three major classes of MIMD multiprocessors: cache-coherent machines, NUMA (non-uniform me...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache...