This paper presents some works on the LU factorization from the ScaLAPACK library. First, a complexity analysis is given. It allows to compute the optimal block size for the block scattered distribution used in ScaLAPACK LU . It also gives the communication phases that are interesting to overlap. Second, two optimizations based on computations/communications overlap are given with experimental results on Intel Paragon system and IBM SP2 system. 1 Introduction The LU factorization is the kernel of many applications. Thus, the importance of optimizing this routine has not to be proven because of the increasing demand of applications dealing with large matrices. Its efficient parallel implementation can bring real improvements in the executio...
Abstract—Dense LU factorization is a prominent benchmark used to rank the performance of supercomput...
AbstractA new parallel algorithm for the LU factorization of a given dense matrix A is described. Th...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
This article discusses the core factorization routines included in the ScaLAPACK library. These rout...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper considers key ideas in the design of out-of-core dense LU factorization routines. A left...
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory...
This paper discusses the design and the implementation of the LU factorization routines included in ...
This paper presents a parallel LU factorization algorithm designed to take advantage of physical bro...
In this paper, we make efficient use of pipelining on LU decomposition with pivoting and a column-sc...
Colloque avec actes et comité de lecture. internationale.International audienceThis paper describes ...
In this paper, we make efficient use of asynchronous communications on the LU decomposition algorit...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
AbstractThis paper considers key ideas in the design of out-of-core dense LU factorization routines....
Abstract—Dense LU factorization is a prominent benchmark used to rank the performance of supercomput...
AbstractA new parallel algorithm for the LU factorization of a given dense matrix A is described. Th...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
This article discusses the core factorization routines included in the ScaLAPACK library. These rout...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper considers key ideas in the design of out-of-core dense LU factorization routines. A left...
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory...
This paper discusses the design and the implementation of the LU factorization routines included in ...
This paper presents a parallel LU factorization algorithm designed to take advantage of physical bro...
In this paper, we make efficient use of pipelining on LU decomposition with pivoting and a column-sc...
Colloque avec actes et comité de lecture. internationale.International audienceThis paper describes ...
In this paper, we make efficient use of asynchronous communications on the LU decomposition algorit...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
AbstractThis paper considers key ideas in the design of out-of-core dense LU factorization routines....
Abstract—Dense LU factorization is a prominent benchmark used to rank the performance of supercomput...
AbstractA new parallel algorithm for the LU factorization of a given dense matrix A is described. Th...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...