We present a micro benchmark suite to evaluate InfiniBand TM implementations with regards to single message performance and the addressing of many hosts. We use a 1:n communication pattern to assess the latency and bandwidth for all different combinations of InfiniBands TM transport services and functions. The results gathered in this study are used to optimize MPI collective communication operations where 1:n communication schemes are not used widely today. We show that applications as well as collective algorithms can benefit from sending multiple messages in a single round. Moreover, the results will be used to choose the transport service and function to develop InfiniBand TM optimized collective communication functions. Our study compa...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
The InfiniBand Architecture provides availability, reliability, scalability, and performance for ser...
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsopera...
The performance of collective communication operations is one of the deciding factors in the overa...
The performance of MPI implementation operations still presents critical issues for high performance...
The performance of MPI implementation operations still presents critical issues for high performance...
In the area of cluster computing, InfiniBand is becoming increasingly popular due to its open standa...
Designing new and optimal algorithms for a specific architecture requires accurate modelling of this...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
The MPI Barrier() call can be crucial for several applications and has been target of different opti...
Clusters of several thousand nodes interconnected with InfiniBand, an emerging high-performance inte...
The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for ...
InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI i...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
The InfiniBand Architecture provides availability, reliability, scalability, and performance for ser...
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsopera...
The performance of collective communication operations is one of the deciding factors in the overa...
The performance of MPI implementation operations still presents critical issues for high performance...
The performance of MPI implementation operations still presents critical issues for high performance...
In the area of cluster computing, InfiniBand is becoming increasingly popular due to its open standa...
Designing new and optimal algorithms for a specific architecture requires accurate modelling of this...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
The MPI Barrier() call can be crucial for several applications and has been target of different opti...
Clusters of several thousand nodes interconnected with InfiniBand, an emerging high-performance inte...
The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for ...
InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI i...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
The InfiniBand Architecture provides availability, reliability, scalability, and performance for ser...
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsopera...