We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP) in order to deliver high performance in the reduction to condensed forms for the solution of dense eigenvalue and singular-value problems. The routines that realize this type of two-sided orthogonal reductions (TSOR) in LAPACK are especially challenging, since a significant fraction of their floating-point operations are cast in terms of memory-bound kernels while the remaining part corresponds to efficient compute-bound kernels. To deal with this scenario: (1) we leverage implementations of memory-bound and compute-bound kernels specifically tuned for AMPs; (2) we select the algorithmic block size for the TSOR routines via a practical mode...
Abstract. Bisection is a parallelizable method for finding the eigenvalues of real symmetric tridiag...
The present work presents a strategy to increase the arithmetic intensity of the solvers. Namely, we...
In a recent paper it was shown how memory traffic can be diminished by reformulating the classic alg...
We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP...
Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been pro...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical ...
AbstractMany applications, ranging from big data analytics to nanostructure designs, require the sol...
In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of sym...
Communicated by Yasuaki Ito Solution of large-scale dense nonsymmetric eigenvalue problem is require...
We are presenting a new method and algorithm for solving several common problems of linear algebra a...
Abstract. The objective of this paper is to extend, in the context of multicore architectures, the c...
As transistor densities increase, it is becoming ever more difficult to gain significant performance ...
As on-node parallelism increases and the performance gap between the processor and the memory system...
Abstract. Bisection is a parallelizable method for finding the eigenvalues of real symmetric tridiag...
The present work presents a strategy to increase the arithmetic intensity of the solvers. Namely, we...
In a recent paper it was shown how memory traffic can be diminished by reformulating the classic alg...
We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP...
Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been pro...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical ...
AbstractMany applications, ranging from big data analytics to nanostructure designs, require the sol...
In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of sym...
Communicated by Yasuaki Ito Solution of large-scale dense nonsymmetric eigenvalue problem is require...
We are presenting a new method and algorithm for solving several common problems of linear algebra a...
Abstract. The objective of this paper is to extend, in the context of multicore architectures, the c...
As transistor densities increase, it is becoming ever more difficult to gain significant performance ...
As on-node parallelism increases and the performance gap between the processor and the memory system...
Abstract. Bisection is a parallelizable method for finding the eigenvalues of real symmetric tridiag...
The present work presents a strategy to increase the arithmetic intensity of the solvers. Namely, we...
In a recent paper it was shown how memory traffic can be diminished by reformulating the classic alg...