Dense linear algebra represents fundamental building blocks in many computational science and engineering applications. The dense linear algebra algorithms must be numerically stable, robust, and reliable in order to be usable as black-box solvers by expert as well as non-expert users. The algorithms also need to scale and run efficiently on massively parallel computers with multi-core nodes. Developing high-performance algorithms for dense matrix computations is a challenging task, especially since the widespread adoption of multi-core architectures. Cache reuse is an even more critical issue on multi-core processors than on uni-core processors due to their larger computational power and more complex memory hierarchies. Blocked matrix stor...
To face the advent of multicore processors and the ever increasing complexity of hardware architectu...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent ...
Linear systems and the solving of those is an important tool in many areas of science. The solving o...
This thesis considers two problems in numerical linear algebra and high performance computing (HPC):...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
The objective of this paper is to extend and redesign the block matrix reduction applied for the fam...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
International audienceAs multicore systems continue to gain ground in the high performance computing...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
To face the advent of multicore processors and the ever increasing complexity of hardware architectu...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent ...
Linear systems and the solving of those is an important tool in many areas of science. The solving o...
This thesis considers two problems in numerical linear algebra and high performance computing (HPC):...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
The solution of dense systems of linear equations is at the heart of numerical computations. Such sy...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
The objective of this paper is to extend and redesign the block matrix reduction applied for the fam...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
International audienceAs multicore systems continue to gain ground in the high performance computing...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
To face the advent of multicore processors and the ever increasing complexity of hardware architectu...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent ...