The mesh is an architecture that has many scientific applications, and matrix transpose is an important permutation frequently performed in various techniques involving systems of linear equations. In this paper, we present an optimal algorithm for performing matrix transpose on meshes that support wormhole switching. If N is even, N our algorithm takes communication steps to perform matrix transpose on an N 2+ 2 × N mesh and requires only 3 more steps when the routing is restricted to XY routing, which is supported by most commercial mesh-connected parallel computers. The lower N −1 bound is and the best previous bound is about N/3.27. The complexity o
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
Efficient routing of messages is the key to the performance of multicomputers. Multicast communicati...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
A deadlock-free fully adaptive routing algorithm for 2D meshes which is optimal in the number of vir...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
The total-exchange is one of the most dense communication patterns and is at the heart of numerous a...
The total-exchange is one of the most dense communication patterns and is at the heart of nume...
Transposing an N × N array that is distributed row- or column-wise across P = N processors is a fund...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many i...
Abstract We prove the existence of asymptotically optimal routing schedules for deflection worm rout...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
Efficient routing of messages is the key to the performance of multicomputers. Multicast communicati...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
A deadlock-free fully adaptive routing algorithm for 2D meshes which is optimal in the number of vir...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
The total-exchange is one of the most dense communication patterns and is at the heart of numerous a...
The total-exchange is one of the most dense communication patterns and is at the heart of nume...
Transposing an N × N array that is distributed row- or column-wise across P = N processors is a fund...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many i...
Abstract We prove the existence of asymptotically optimal routing schedules for deflection worm rout...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
Efficient routing of messages is the key to the performance of multicomputers. Multicast communicati...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...