Abstract—To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix mul-tiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model which considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms...
This paper initiates the study of communication complexity when the processors have limited work spa...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
To efficiently scale dense linear algebra problems to future exascale systems, communication cost mu...
This is a post-peer-review, pre-copyedit version. The final authenticated version is available onlin...
International audienceModern, massively parallel computers play a fundamental role in a large and ra...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
Graph algorithms typically have very low computational intensities, hence their execution times are ...
Advancements in the field of high-performance scientific computing are necessary to address the most...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
We study several solvers for the solution of general linear systems where the main objective is to r...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Two issues in linear algebra algorithms for multicomputers are addressed. First, how tounify paralle...
AbstractWe study several solvers for the solution of general linear systems where the main objective...
This paper initiates the study of communication complexity when the processors have limited work spa...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
To efficiently scale dense linear algebra problems to future exascale systems, communication cost mu...
This is a post-peer-review, pre-copyedit version. The final authenticated version is available onlin...
International audienceModern, massively parallel computers play a fundamental role in a large and ra...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
Graph algorithms typically have very low computational intensities, hence their execution times are ...
Advancements in the field of high-performance scientific computing are necessary to address the most...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
We study several solvers for the solution of general linear systems where the main objective is to r...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Two issues in linear algebra algorithms for multicomputers are addressed. First, how tounify paralle...
AbstractWe study several solvers for the solution of general linear systems where the main objective...
This paper initiates the study of communication complexity when the processors have limited work spa...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...