An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

Huang, He
Wang, Liqiang
Lee, En-Jui
Chen, Po

Open link

Publication date

December 2012

DOI

10.1016/j.procs.2012.04.009

Publisher

Published by Elsevier B.V.

Abstract

AbstractLSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6x speedup with 15.7 GFlops in...

Extracted data

We use cookies to provide a better user experience.

Data Protection

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

Abstract

Extracted data

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

Abstract

Extracted data

Related items

Related items