In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking with) lookahead for distributed memory LU, LLT and QR factorizations. Concepts and some useful properties of a simplified model of lookahead are explored, including the minimal degree of lookahead required for optimal performance. Issues in the implementation of lookahead are discussed, which are more involved for the cases of LLT and QR factorizations. It is also explained how hybrid algorithmic blocking and lookahead techniques can be implemented. Implications for parallel linear algebra library design are also discussed. Results are given on the Fujitsu AP1000 and AP+ multicomputers, which have relatively high communication to computation ...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Dense linear algebra represents fundamental building blocks in many computational science and engine...
Our experimental results showed that block based algorithms for numerically intensive applications a...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
Abstract: This paper presents a 7-step, semi-systematic approach for designing and implementing para...
International audienceAs multicore systems continue to gain ground in the high performance computing...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
This dissertation details contributions made by the author to the field of computer science while wo...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper provides an introduction to algorithms for fundamental linear algebra problems on various...
This report addresses several important aspects of parallel implementation of QR decomposition of a ...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Dense linear algebra represents fundamental building blocks in many computational science and engine...
Our experimental results showed that block based algorithms for numerically intensive applications a...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
International audienceWe present parallel and sequential dense QR factorization algorithms that are ...
Abstract: This paper presents a 7-step, semi-systematic approach for designing and implementing para...
International audienceAs multicore systems continue to gain ground in the high performance computing...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
This dissertation details contributions made by the author to the field of computer science while wo...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper provides an introduction to algorithms for fundamental linear algebra problems on various...
This report addresses several important aspects of parallel implementation of QR decomposition of a ...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Dense linear algebra represents fundamental building blocks in many computational science and engine...