Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8$${\times }$$–4.3$${\times }$$ speedup over an optimized CPU implementation when test...
AbstractSparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations....
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
We consider a fast, robust and scalable solver using graphic processing units (GPU) as accelerators ...
Achieving high performance and performance portability for large-scale scientific applications is a ...
The Conjugate Gradient (CG) method is a widely-used iterative method for solving linear systems desc...
The proliferation of accelerators in modern clusters makes efficient coprocessor programming a key r...
to appearInternational audienceA wide class of numerical methods needs to solve a linear system, whe...
Abstract. The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In th...
In this article the preconditioned conjugate gradient (PCG) method, realized on GPU and intended to ...
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient ...
International audienceWhereas most today parallel High Performance Computing (HPC) software is writt...
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We i...
The sparse Matrix-Vector multiplication is a key operation in science and engineering along with th...
AbstractWe propose a parallel implementation of the Preconditioned Conjugate Gradient algorithm on a...
International audienceThe parallelization of numerical simulation algorithms, i.e., their adaptation...
AbstractSparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations....
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
We consider a fast, robust and scalable solver using graphic processing units (GPU) as accelerators ...
Achieving high performance and performance portability for large-scale scientific applications is a ...
The Conjugate Gradient (CG) method is a widely-used iterative method for solving linear systems desc...
The proliferation of accelerators in modern clusters makes efficient coprocessor programming a key r...
to appearInternational audienceA wide class of numerical methods needs to solve a linear system, whe...
Abstract. The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In th...
In this article the preconditioned conjugate gradient (PCG) method, realized on GPU and intended to ...
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient ...
International audienceWhereas most today parallel High Performance Computing (HPC) software is writt...
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We i...
The sparse Matrix-Vector multiplication is a key operation in science and engineering along with th...
AbstractWe propose a parallel implementation of the Preconditioned Conjugate Gradient algorithm on a...
International audienceThe parallelization of numerical simulation algorithms, i.e., their adaptation...
AbstractSparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations....
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
We consider a fast, robust and scalable solver using graphic processing units (GPU) as accelerators ...