Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

John C. Bowman
Malcolm Roberts
John C. Bowman
Malcolm Roberts
John C. Bowman
Malcolm Roberts

Publication date

January 2016

Abstract

Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore architectures running in a hybrid OpenMP/MPI configuration is pre-sented. Significant boosts in speed are observed relative to the distributed transpose used in the state-of-the-art adaptive FFTW library. In some cases, a hybrid config-uration allows one to reduce communication costs by reducing the number of MPI nodes, and thereby increasing message sizes. This also allows for a more slab-like than pencil-like domain decomposition for multidimensional Fast Fourier Trans-forms, reducing the cost of, or even eliminating the need for, a second distributed transpose. Nonblocking all-to-all transfers enable user computation and communi-cation to be ove...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

Abstract

Extracted data

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

Abstract

Extracted data

Related items

Related items