The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obt...
This is a post-peer-review, pre-copyedit version of an article published in Proceedings of 4th Eurom...
In this paper, we introduce a new column selection strategy, named here “Deviation Maximization”, an...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
A fundamental problem when adding column pivoting to the Householder QR fac- torization is that onl...
[Abstract] We present a parallel algorithm for the QR factorization with column pivoting of a spar...
The pivoted QLP decomposition is computed through two consecutive pivoted QR decompositions, and pro...
A fundamental problem when adding column pivoting to the Householder QR factorization is that only a...
[EN] The processing of digital sound signals often requires the computation of the QR factorization ...
AbstractIn this paper we present an experimental comparison of several numerical tools for computing...
[[abstract]]Numerical algorithm runtimes are increasingly dominated by the cost of communication (me...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We introduce a parallel algorithm for computing the low rank approximation $A_k$ of a large matrix $...
SuiteSparseQR is a sparse multifrontal QR factorization algorithm. Dense matrix methods within each ...
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on h...
AbstractLinear least squares problems are commonly solved by QR factorization. When multiple solutio...
This is a post-peer-review, pre-copyedit version of an article published in Proceedings of 4th Eurom...
In this paper, we introduce a new column selection strategy, named here “Deviation Maximization”, an...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
A fundamental problem when adding column pivoting to the Householder QR fac- torization is that onl...
[Abstract] We present a parallel algorithm for the QR factorization with column pivoting of a spar...
The pivoted QLP decomposition is computed through two consecutive pivoted QR decompositions, and pro...
A fundamental problem when adding column pivoting to the Householder QR factorization is that only a...
[EN] The processing of digital sound signals often requires the computation of the QR factorization ...
AbstractIn this paper we present an experimental comparison of several numerical tools for computing...
[[abstract]]Numerical algorithm runtimes are increasingly dominated by the cost of communication (me...
We present the techniques of adaptive blocking and incremental condition estimation which we believ...
We introduce a parallel algorithm for computing the low rank approximation $A_k$ of a large matrix $...
SuiteSparseQR is a sparse multifrontal QR factorization algorithm. Dense matrix methods within each ...
A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on h...
AbstractLinear least squares problems are commonly solved by QR factorization. When multiple solutio...
This is a post-peer-review, pre-copyedit version of an article published in Proceedings of 4th Eurom...
In this paper, we introduce a new column selection strategy, named here “Deviation Maximization”, an...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...