. A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connectivity. The matrices involved are distributed with non-scattered blocked data distribution. The algorithm consists of two main parts, alignment and data movement of subarrays involved in the operation and a distributed blocked matrix multiplication algorithm on (sub)matrices using only a square submesh. Our general approach makes it possible to perform GEMM operations on non-overlapping submeshes simultane...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many hi...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
In this paper, the index space of the (n×n)-matrix multiply-add problem C = C +A·B is represented as...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
The distributed matrix multiplication problem with an unknown number of stragglers is considered, wh...
Abstract. Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performan...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
In this paper, I explain a previously published three-dimensional algorithm for multiplying two two-...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
Abstract. We consider the realization of matrix-matrix multiplication and propose a hierarchical alg...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many hi...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
In this paper, the index space of the (n×n)-matrix multiply-add problem C = C +A·B is represented as...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
The distributed matrix multiplication problem with an unknown number of stragglers is considered, wh...
Abstract. Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performan...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
In this paper, I explain a previously published three-dimensional algorithm for multiplying two two-...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
Abstract. We consider the realization of matrix-matrix multiplication and propose a hierarchical alg...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many hi...