Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to indentify factors that influence efficincies of primitive MPI point-to-point and collective operations. These benchmarks have been implemented in OpenACC, CUDA and OpenCL. On the Intel MIC platform, existing MPI benchmarks can be executed with appropriate mapping onto the MIC and CPU cores. Our results demonstrate that the MPI operations are highly sensitive to the memory and I/O bus c...
This report describes the development of an MPI parallelization support on top of the existing OpenM...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
MPI is the predominant model for parallel programming in technical high performance computing. With ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
In order to reach exascale computing capability, accelerators have become a crucial part in developi...
Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parall...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
In recent years, GPU computing has been very popular for scientific applications, especially after t...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
International audienceIn this session you will learn how to program multi GPU systems or GPU cluster...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
Abstract. The current trend to multicore architectures underscores the need of parallelism. While ne...
This report describes the development of an MPI parallelization support on top of the existing OpenM...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
MPI is the predominant model for parallel programming in technical high performance computing. With ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
In order to reach exascale computing capability, accelerators have become a crucial part in developi...
Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parall...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
In recent years, GPU computing has been very popular for scientific applications, especially after t...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
International audienceIn this session you will learn how to program multi GPU systems or GPU cluster...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
Abstract. The current trend to multicore architectures underscores the need of parallelism. While ne...
This report describes the development of an MPI parallelization support on top of the existing OpenM...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
MPI is the predominant model for parallel programming in technical high performance computing. With ...