AbstractLSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6x speedup with 15.7 GFlops in...
Abstract. Graphics Processing Units (GPUs) are massive data parallel processors. High performance co...
<p>In pixel-wise parametric imaging applications, a large amount of experimental data for all image ...
Extended version of EuroGPU symposium article, in the International Conference on Parallel Computing...
AbstractLSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve l...
AbstractLeast Squares with QR-factorization (LSQR) method is a widely used Krylov subspace algorithm...
Inverse problems Parallel scientific computing a b s t r a c t The LSQR algorithm developed by Paige...
IEEE Computer SocietyInternational audienceThe main objective of this work consists in analyzing sub...
The original publication is available at www.springerlink.comInternational audienceA wide class of g...
Sparse matrix multiplication is a common operation in linear algebra and an important element of oth...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
This work shows that it is possible to obtain faster MPI image reconstructions by implementing the a...
Abstract. Linear systems are required to solve in many scientific applications and the solution of t...
Today it is usual to have computational systems formed by a multicore together with one or more GPUs...
The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many hi...
Hierarchically semiseparable (HSS) matrix algorithms are emerging techniques in constructing the sup...
Abstract. Graphics Processing Units (GPUs) are massive data parallel processors. High performance co...
<p>In pixel-wise parametric imaging applications, a large amount of experimental data for all image ...
Extended version of EuroGPU symposium article, in the International Conference on Parallel Computing...
AbstractLSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve l...
AbstractLeast Squares with QR-factorization (LSQR) method is a widely used Krylov subspace algorithm...
Inverse problems Parallel scientific computing a b s t r a c t The LSQR algorithm developed by Paige...
IEEE Computer SocietyInternational audienceThe main objective of this work consists in analyzing sub...
The original publication is available at www.springerlink.comInternational audienceA wide class of g...
Sparse matrix multiplication is a common operation in linear algebra and an important element of oth...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
This work shows that it is possible to obtain faster MPI image reconstructions by implementing the a...
Abstract. Linear systems are required to solve in many scientific applications and the solution of t...
Today it is usual to have computational systems formed by a multicore together with one or more GPUs...
The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many hi...
Hierarchically semiseparable (HSS) matrix algorithms are emerging techniques in constructing the sup...
Abstract. Graphics Processing Units (GPUs) are massive data parallel processors. High performance co...
<p>In pixel-wise parametric imaging applications, a large amount of experimental data for all image ...
Extended version of EuroGPU symposium article, in the International Conference on Parallel Computing...