AbstractMany applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-sided orthogonal transformations. This step is standardly used to significantly accelerate the solution process. We present a performance analysis of the main two-sided factorizations used in these reductions: the bidiagonalization, tridiagonalization, and the upper Hessenberg factorizations on heterogeneous systems of multicore CPUs and Xeon Phi coprocessors. We derive a performance model and use it to guide the analysis and to evaluate perf...
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number o...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
Abstract. The objective of this paper is to extend, in the context of multicore architectures, the c...
The objective of this paper is to extend, in the context of multicore architectures, the concepts of...
Matrix Factorization (MF) has been widely applied in machine learning and data mining. Due to the la...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP...
International audienceWe study the performance of dense symmetric indefinite factorizations (Bunch-K...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
Communicated by Yasuaki Ito Solution of large-scale dense nonsymmetric eigenvalue problem is require...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
The objective of this paper is to extend and redesign the block matrix reduction applied for the fam...
Abstract. The Chip Multiprocessor (CMP) will be the basic build-ing block for computer systems rangi...
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number o...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
Abstract. The objective of this paper is to extend, in the context of multicore architectures, the c...
The objective of this paper is to extend, in the context of multicore architectures, the concepts of...
Matrix Factorization (MF) has been widely applied in machine learning and data mining. Due to the la...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP...
International audienceWe study the performance of dense symmetric indefinite factorizations (Bunch-K...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
Communicated by Yasuaki Ito Solution of large-scale dense nonsymmetric eigenvalue problem is require...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
The objective of this paper is to extend and redesign the block matrix reduction applied for the fam...
Abstract. The Chip Multiprocessor (CMP) will be the basic build-ing block for computer systems rangi...
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number o...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...