We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical side, we present the first optimal algorithm for matrix transpose on two-dimensional meshes.Then we consider issues on implementations, show that the theoretical best bound cannot be achieved and present an alternative approach that really improves the practical performance. Finally, we introduce the concept of orthogonalizations, which are generalization of matrix transposes. We show how to realize them efficiently and present interesting applications of this new technique
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
The mesh is an architecture that has many scientific applications, and matrix transpose is an import...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
. A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation...
Transposing an N × N array that is distributed row- or column-wise across P = N processors is a fund...
In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
The mesh is an architecture that has many scientific applications, and matrix transpose is an import...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
. A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation...
Transposing an N × N array that is distributed row- or column-wise across P = N processors is a fund...
In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...