We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical side, we present the first optimal algorithm for matrix transpose on two-dimensional meshes. Then we consider issues on implementations, show that the theoretical best bound cannot be achieved and present an alternative approach that really improves the practical performance. Finally, we introduce the concept of orthogonalizations, which are generalization of matrix transposes. We show how to realize them efficiently and present interesting applications of this new technique
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
The mesh is an architecture that has many scientific applications, and matrix transpose is an import...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
. A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation...
AbstractGiven a rectangular m×n matrix stored as a two-dimensional array, we want to transpose it in...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by...
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...
We consider the problem of matrix transpose on mesh-connected processor networks. On the theoretical...
The mesh is an architecture that has many scientific applications, and matrix transpose is an import...
AbstractWe give nearly optimal algorithms for matrix transpose on meshes with wormhole and XY routin...
Abstract An adaptive parallel matrix transpose algorithm optimized for distrib-uted multicore archit...
This thesis presents a novel algorithm for Transposing Rectangular matrices In-place and in Parallel...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
We describe a decomposition for in-place matrix transposi-tion, with applications to Array of Struct...
This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One...
. A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation...
AbstractGiven a rectangular m×n matrix stored as a two-dimensional array, we want to transpose it in...
Eklundh's (1972) algorithm to transpose a large matrix stored on an external device such as a disc h...
In this work, we present an approach to alleviate the potential benefit of adder graph algorithms by...
AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
This article presents new properties of the mesh array for matrix multiplication. In contrast to the...