This paper presents a parallel LU factorization algorithm designed to take advantage of physical broadcast communication facilities as well as overlapping of communication and computing. Physical broadcast is directly available on Ethernet networks hardware, one of the most used interconnection networks in current clusters installed for parallel computing. Overlapped communication is a well-known strategy for hiding communication latency, which is one of the most common source of parallel performance penalization. Performance analysis and experimentation of the proposed parallel LU factorization algorithm are presented. Also, the performance of the proposed algorithm is compared with that of the algorithm used in ScaLAPACK (Scalable LAPACK)...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
The number of cores in multicore computers has an irreversible tendency to increase. Also, computers...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
This paper presents some works on the LU factorization from the ScaLAPACK library. First, a complexi...
In this paper, we make efficient use of asynchronous communications on the LU decomposition algorit...
AbstractThis paper considers key ideas in the design of out-of-core dense LU factorization routines....
This paper considers key ideas in the design of out-of-core dense LU factorization routines. A left...
In this thesis, parallel computing on installed local area networks (LAN) is focused, analyzing prob...
International audienceIn this paper, we present a method for overlapping communications on parallel ...
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory...
[[abstract]]Some common guidelines that can be used to design parallel algorithms under the single-c...
Colloque avec actes et comité de lecture. internationale.International audienceThis paper describes ...
AbstractA new parallel algorithm for the LU factorization of a given dense matrix A is described. Th...
Matrix multiplication is taken as a test bed for parallel processing on heterogeneous networks of wo...
The paper proposes an analytical model for estimating the performance of Pipelined Ring algorithm fo...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
The number of cores in multicore computers has an irreversible tendency to increase. Also, computers...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
This paper presents some works on the LU factorization from the ScaLAPACK library. First, a complexi...
In this paper, we make efficient use of asynchronous communications on the LU decomposition algorit...
AbstractThis paper considers key ideas in the design of out-of-core dense LU factorization routines....
This paper considers key ideas in the design of out-of-core dense LU factorization routines. A left...
In this thesis, parallel computing on installed local area networks (LAN) is focused, analyzing prob...
International audienceIn this paper, we present a method for overlapping communications on parallel ...
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory...
[[abstract]]Some common guidelines that can be used to design parallel algorithms under the single-c...
Colloque avec actes et comité de lecture. internationale.International audienceThis paper describes ...
AbstractA new parallel algorithm for the LU factorization of a given dense matrix A is described. Th...
Matrix multiplication is taken as a test bed for parallel processing on heterogeneous networks of wo...
The paper proposes an analytical model for estimating the performance of Pipelined Ring algorithm fo...
This paper presents CALU, a Communication Avoiding algorithm for the LU factorization of dense matri...
The number of cores in multicore computers has an irreversible tendency to increase. Also, computers...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...