The results produced by five different MPI bench-mark programs on an SGI Altix 3700 are analyzed and compared. There are significant differences in the results for some MPI operations. We investigate the reasons for these discrepancies, which are due to differences in the measurement techniques, implementation details and default configurations of the different benchmarks. The variation in results on the Altix are generally much greater than on a distributed memory machine, due pri-marily to the ccNUMA architecture and the importance of cache effects, as well as some implementation details of the SGI MPI libraries. 1
Due to the character of the original source materials and the nature of batch digitization, quality ...
This paper describes investigations on the memory performance of the shared memory systems Cray X-MP...
Cache-coherent non-uniform memory access (ccNUMA) architectures have attracted lots of academic and ...
The results produced by five different MPI benchmark programs on an SGI Altix 3700 are analyzed and ...
There are several benchmark programs available to measure the performance of MPI on parallel comput...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
This paper presents the comparison of the COMOPS benchmark performance in MPI and shared memory on t...
This paper reports the measurements of MPI communication benchmarking on Khaldun cluster which ran o...
The main objective of the MPI communication library is to enable portable parallel programming with ...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
There are three major classes of MIMD multiprocessors: cache-coherent machines, NUMA (non-uniform me...
AbstractThe HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compa...
We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks....
The majority of current HPC applications are composed of complex and irregular data structures that ...
Abstract: The developments of multi-core technology have induced big challenges to software structur...
Due to the character of the original source materials and the nature of batch digitization, quality ...
This paper describes investigations on the memory performance of the shared memory systems Cray X-MP...
Cache-coherent non-uniform memory access (ccNUMA) architectures have attracted lots of academic and ...
The results produced by five different MPI benchmark programs on an SGI Altix 3700 are analyzed and ...
There are several benchmark programs available to measure the performance of MPI on parallel comput...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
This paper presents the comparison of the COMOPS benchmark performance in MPI and shared memory on t...
This paper reports the measurements of MPI communication benchmarking on Khaldun cluster which ran o...
The main objective of the MPI communication library is to enable portable parallel programming with ...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
There are three major classes of MIMD multiprocessors: cache-coherent machines, NUMA (non-uniform me...
AbstractThe HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compa...
We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks....
The majority of current HPC applications are composed of complex and irregular data structures that ...
Abstract: The developments of multi-core technology have induced big challenges to software structur...
Due to the character of the original source materials and the nature of batch digitization, quality ...
This paper describes investigations on the memory performance of the shared memory systems Cray X-MP...
Cache-coherent non-uniform memory access (ccNUMA) architectures have attracted lots of academic and ...