The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on modern hardware architecture with multiple Non-Uniform Memory Access (NUMA) domains. We investigate its performance in two case studies: the matrix-matrix multiplication and Conway’s game of life. We compare MPI shared memory performance in terms of execution time and memory consumption with the performance of implementations using OpenMP and MPI point-to-point communication, also called "MPI two-sided". We perform strong scaling tests in both test cases. We observe that MPI two-sided implementation is 21% and 18% faster than the MPI shared and OpenMP implementations respectively in the matrix-matrix multiplication when using 32 processes. M...
There are several benchmark programs available to measure the performance of MPI on parallel comput...
Abstract—Comparison between OpenMP for thread programming model and MPI for message passing programm...
An introduction to the parallel programming of supercomputers is given. The focus is on the usage of...
The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on...
By programming in parallel, large problem is divided in smaller ones, which are solved concurrently....
In this article we recount the sequence of steps by which MPICH, a high-performance, portable implem...
This paper will present the implementation of parallel block factorization QR with Compact WY form. ...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
The Message-Passing Interface (MPI) is a widely-used standard library for programming parallel appli...
Abstract. Over the last decade, Message Passing Interface (MPI) has become a very successful paralle...
Present and future multi-core computational system architecture attracts researchers to utilize this...
This paper will present the implementation of parallel block factorization QR with Compact WY form. ...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
In 2008, the Catamount lightweight kernel was extended to support direct access shared memory betwee...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
There are several benchmark programs available to measure the performance of MPI on parallel comput...
Abstract—Comparison between OpenMP for thread programming model and MPI for message passing programm...
An introduction to the parallel programming of supercomputers is given. The focus is on the usage of...
The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on...
By programming in parallel, large problem is divided in smaller ones, which are solved concurrently....
In this article we recount the sequence of steps by which MPICH, a high-performance, portable implem...
This paper will present the implementation of parallel block factorization QR with Compact WY form. ...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
The Message-Passing Interface (MPI) is a widely-used standard library for programming parallel appli...
Abstract. Over the last decade, Message Passing Interface (MPI) has become a very successful paralle...
Present and future multi-core computational system architecture attracts researchers to utilize this...
This paper will present the implementation of parallel block factorization QR with Compact WY form. ...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
In 2008, the Catamount lightweight kernel was extended to support direct access shared memory betwee...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
There are several benchmark programs available to measure the performance of MPI on parallel comput...
Abstract—Comparison between OpenMP for thread programming model and MPI for message passing programm...
An introduction to the parallel programming of supercomputers is given. The focus is on the usage of...