Communicated by Yasuaki Ito Solution of large-scale dense nonsymmetric eigenvalue problem is required in many areas of scientific and engineering computing, such as vibration analysis of automobiles and analysis of electronic diffraction patterns. In this study, we focus on the Hessenberg reduction step and consider accelerating it in a hybrid CPU-GPU computing environment. Considering that the Hessenberg reduction algorithm consists almost entirely of BLAS (Basic Linear Algebra Subprograms) operations, we propose three approaches for distributing the BLAS operations between CPU and GPU. Among them, the third approach, which assigns small-size BLAS operations to CPU and distributes large-size BLAS operations between CPU and GPU in some opti...
As modern massively parallel clusters are getting larger with beefier compute nodes, traditional par...
This paper explores the early implementation of high- performance routines for the solution of multi...
This paper explores the early implementation of high-performance routines for the solution of multip...
In the nonsymmetric eigenvalue problem, work has focused on the Hessenberg reduction and QR iteratio...
In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generaliz...
As modern massively parallel clusters are getting larger with beefier compute nodes, traditional par...
In this paper, we present an algorithm for the reduction to block upper-Hessenberg form which can be...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
In this paper, we present an algorithm for the reduction to block upper-Hessenberg form which can be...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
As a recurrent problem in numerical analysis and computational science, eigenvector and eigenvalue d...
The solution of (generalized) eigenvalue problems for symmetric or Hermitian matrices is a common su...
We describe two techniques for speeding up eigenvalue and singular value computations on shared memo...
This work deals with the solution of large non-Hermitian linear systems on desktop workstations with...
We present several algorithms to compute the solution of a linear system of equa-tions on a GPU, as ...
As modern massively parallel clusters are getting larger with beefier compute nodes, traditional par...
This paper explores the early implementation of high- performance routines for the solution of multi...
This paper explores the early implementation of high-performance routines for the solution of multip...
In the nonsymmetric eigenvalue problem, work has focused on the Hessenberg reduction and QR iteratio...
In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generaliz...
As modern massively parallel clusters are getting larger with beefier compute nodes, traditional par...
In this paper, we present an algorithm for the reduction to block upper-Hessenberg form which can be...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
In this paper, we present an algorithm for the reduction to block upper-Hessenberg form which can be...
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) too...
As a recurrent problem in numerical analysis and computational science, eigenvector and eigenvalue d...
The solution of (generalized) eigenvalue problems for symmetric or Hermitian matrices is a common su...
We describe two techniques for speeding up eigenvalue and singular value computations on shared memo...
This work deals with the solution of large non-Hermitian linear systems on desktop workstations with...
We present several algorithms to compute the solution of a linear system of equa-tions on a GPU, as ...
As modern massively parallel clusters are getting larger with beefier compute nodes, traditional par...
This paper explores the early implementation of high- performance routines for the solution of multi...
This paper explores the early implementation of high-performance routines for the solution of multip...