To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model which considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms with over...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
AbstractWe study several solvers for the solution of general linear systems where the main objective...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Abstract—To efficiently scale dense linear algebra problems to future exascale systems, communicatio...
International audienceModern, massively parallel computers play a fundamental role in a large and ra...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
Graph algorithms typically have very low computational intensities, hence their execution times are ...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
Advancements in the field of high-performance scientific computing are necessary to address the most...
This paper initiates the study of communication complexity when the processors have limited work spa...
We study several solvers for the solution of general linear systems where the main objective is to r...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Two issues in linear algebra algorithms for multicomputers are addressed. First, how tounify paralle...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
AbstractWe study several solvers for the solution of general linear systems where the main objective...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Abstract—To efficiently scale dense linear algebra problems to future exascale systems, communicatio...
International audienceModern, massively parallel computers play a fundamental role in a large and ra...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
Graph algorithms typically have very low computational intensities, hence their execution times are ...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
Advancements in the field of high-performance scientific computing are necessary to address the most...
This paper initiates the study of communication complexity when the processors have limited work spa...
We study several solvers for the solution of general linear systems where the main objective is to r...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Two issues in linear algebra algorithms for multicomputers are addressed. First, how tounify paralle...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
AbstractWe study several solvers for the solution of general linear systems where the main objective...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...