Cache-coherent non-uniform memory access (ccNUMA) architectures have attracted lots of academic and industry interests as a promising direction to large scale parallel computing. Data placement has been used as a major optimization method on such machines. This study examined the scalability and the effect of data placement on a state-of-the-art ccNUMA machine, SGI Origin, using 16 processors. Three applications from SPLASH-2 are used, FFT, Radix and Barnes-Hut. The results showed that FFT and Radix cannot scale to 16 processors with 70% efficiency even for the largest data sizes tested. Barnes-Hut doesn't scale for small data size but scales linearly for large input size. The results also showed that data placement does not make any differ...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
The results produced by five different MPI benchmark programs on an SGI Altix 3700 are analyzed and ...
The SGI Origin 2000 is a cache-coherent non-uniform memory access IccNUMA) tnultionxessor desipned a...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks....
The results produced by five different MPI bench-mark programs on an SGI Altix 3700 are analyzed and...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Performance and scalability of high performance scientific applications on large scale parallel mach...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Across a broad range of applications, multicore technol-ogy is the most important factor that drives...
Abstract—Currently, parallel platforms based on large scale hierarchical shared memory multiprocesso...
Since the first vector supercomputers in the mid-1970’s, the largest scale applications have traditi...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
The results produced by five different MPI benchmark programs on an SGI Altix 3700 are analyzed and ...
The SGI Origin 2000 is a cache-coherent non-uniform memory access IccNUMA) tnultionxessor desipned a...
Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from...
Two interesting variations of large-scale shared-memory ma-chines that have recently emerged are cac...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks....
The results produced by five different MPI bench-mark programs on an SGI Altix 3700 are analyzed and...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Performance and scalability of high performance scientific applications on large scale parallel mach...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Across a broad range of applications, multicore technol-ogy is the most important factor that drives...
Abstract—Currently, parallel platforms based on large scale hierarchical shared memory multiprocesso...
Since the first vector supercomputers in the mid-1970’s, the largest scale applications have traditi...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
The results produced by five different MPI benchmark programs on an SGI Altix 3700 are analyzed and ...