Block-cyclic order elimination algorithms for LU and QR factorization and solve routines are described for distributed memory architectures with processing nodes configured as two-dimensional arrays of arbitrary shape. The cyclic order elimination together with a consecutive data allocation yields good load-balance for both the factorization and solution phases for the solution of dense systems of equations by LU and QR decomposition. Blocking may offer a substantial performance enhancement on architectures for which the level-2 or level-3 BLAS are ideal for operations local to a node. High rank updates local to a node may have a performance that is a factor of four or more higher than a rank-1 update. We show that in many parallel implemen...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
International audienceIn this paper we study the performance of two classical dense linear algebra a...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
In this paper, we present a new load balancing technique, called panel scattering, which is generall...
Abstract—Dense LU factorization is a prominent benchmark used to rank the performance of supercomput...
A statically scheduled parallel block QR factorization procedure is described. It is based on "bloc...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
Our experimental results showed that block based algorithms for numerically intensive applications a...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
International audienceTo exploit the potential of multicore architectures, recent dense linear algeb...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
International audienceIn this paper we study the performance of two classical dense linear algebra a...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
In this paper, we present a new load balancing technique, called panel scattering, which is generall...
Abstract—Dense LU factorization is a prominent benchmark used to rank the performance of supercomput...
A statically scheduled parallel block QR factorization procedure is described. It is based on "bloc...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
Our experimental results showed that block based algorithms for numerically intensive applications a...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
International audienceTo exploit the potential of multicore architectures, recent dense linear algeb...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
International audienceIn this paper we study the performance of two classical dense linear algebra a...