This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for...
The performance of collective communication operations is one of the deciding factors in the overa...
This paper presents a portable optimization for MPI communications, called PRAcTICaL-MPI (Portable A...
(eng) This report introduces a version of MPICH handling efficiently different networks simultaneous...
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the r...
High-performance systems are undergoing a major shift as commodity multi-core systems become increas...
The performance of MPI implementation operations still presents critical issues for high performance...
The performance of MPI implementation operations still presents critical issues for high performance...
Modern high performance computing (HPC) applications, for example adaptive mesh refinement and mul...
Abstract — In this paper, we present an initial performance evaluation of InfiniBand HCAs from Mella...
Parallel programmers typically assume that all resources required for a program's execution are dedi...
This report introduces a version of MPICH handling efficiently different networks simultaneously. Th...
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in pow...
Message Passing Interface is widely used for Parallel and Distributed Computing. MPICH and LAM are p...
Although InfiniBand Architecture is relatively new in the high performance computing area, it o#ers ...
We describe the design and implementation of MPI-NP, a Myrinet communication system tailored to sup...
The performance of collective communication operations is one of the deciding factors in the overa...
This paper presents a portable optimization for MPI communications, called PRAcTICaL-MPI (Portable A...
(eng) This report introduces a version of MPICH handling efficiently different networks simultaneous...
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the r...
High-performance systems are undergoing a major shift as commodity multi-core systems become increas...
The performance of MPI implementation operations still presents critical issues for high performance...
The performance of MPI implementation operations still presents critical issues for high performance...
Modern high performance computing (HPC) applications, for example adaptive mesh refinement and mul...
Abstract — In this paper, we present an initial performance evaluation of InfiniBand HCAs from Mella...
Parallel programmers typically assume that all resources required for a program's execution are dedi...
This report introduces a version of MPICH handling efficiently different networks simultaneously. Th...
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in pow...
Message Passing Interface is widely used for Parallel and Distributed Computing. MPICH and LAM are p...
Although InfiniBand Architecture is relatively new in the high performance computing area, it o#ers ...
We describe the design and implementation of MPI-NP, a Myrinet communication system tailored to sup...
The performance of collective communication operations is one of the deciding factors in the overa...
This paper presents a portable optimization for MPI communications, called PRAcTICaL-MPI (Portable A...
(eng) This report introduces a version of MPICH handling efficiently different networks simultaneous...