The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony Skjellum, includes fourteen dense matrix multiplication algorithms mapped onto two-dimensional process grids using the Message Passing Interface (MPI). This thesis\u27 goal is to achieve optimized performance of parallel, dense linear algebra algorithms by varying the algorithm as a function of problem size, shape, data layout, concurrency, and architecture. We integrate these algorithms with an intra-node BLAS DGEMM kernel designed by Thomas Hines (Tennessee Tech), which improves the BLAS DGEMM performance in fat-by-thin dense matrix multiplication region. We add a rank-k-based SUMMA algorithm, which performs better than rank-1-based SUMMA. ...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on ...
In HPC, data redistributions (reorganizations) are used in parallel applications to improve performa...
Integrating polyalgorithm library with optimized linear algebra libraries on HPC platforms, leveragi...
Matrix multiplication is a core building block for numerous scientific computing and, more recently,...
Abstract. Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performan...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
Matrix multiplication is a basic operation of linear algebra, and has numerous applications to the t...
Combinatorial scientific computing plays an important enabling role in computational science, partic...
Matrix-matrix multiplication is perhaps the most important operation used as a basic building block...
International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many hi...
We propose a comprehensive and generic framework to minimize multiple and different volume-based com...
High performance, massively-parallel multi-physics simulations are built on efficient mesh data stru...
In this document, we describe two strategies of distribution of computations that can be used to imp...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on ...
In HPC, data redistributions (reorganizations) are used in parallel applications to improve performa...
Integrating polyalgorithm library with optimized linear algebra libraries on HPC platforms, leveragi...
Matrix multiplication is a core building block for numerous scientific computing and, more recently,...
Abstract. Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performan...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
Matrix multiplication is a basic operation of linear algebra, and has numerous applications to the t...
Combinatorial scientific computing plays an important enabling role in computational science, partic...
Matrix-matrix multiplication is perhaps the most important operation used as a basic building block...
International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many hi...
We propose a comprehensive and generic framework to minimize multiple and different volume-based com...
High performance, massively-parallel multi-physics simulations are built on efficient mesh data stru...
In this document, we describe two strategies of distribution of computations that can be used to imp...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on ...