Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our ap...
Matrix decomposition plays an increasingly significant role in many scientific and engineering appli...
Sparse linear algebra algorithms typically perform poorly on superscalar, general-purpose processors...
Discrete Event Simulation on GPUs employing parallel heap data structure is the focus of this thesis...
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is ...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravi...
Sparse Lower-Upper (LU) Triangular Decomposition is important to many di erent applications, includi...
With the continued development of computation and communication technologies, we are overwhelmed wit...
abstract: With the advent of GPGPU, many applications are being accelerated by using CUDA programing...
Graphs are a common representation in many problem domains, including engineering, finance, medicine...
The main objective of this thesis is to propose new methods for designing high-performance embedded ...
The growing importance of geospatial databases has made it essential to perform complex spatial que...
Thesis (Master) -- University of Cyprus, Faculty of Pure and Applied Sciences, Department of Compute...
There are hundreds of papers on accelerating sparse matrix vector multiplication (SpMV), however, on...
This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (...
Matrix decomposition plays an increasingly significant role in many scientific and engineering appli...
Sparse linear algebra algorithms typically perform poorly on superscalar, general-purpose processors...
Discrete Event Simulation on GPUs employing parallel heap data structure is the focus of this thesis...
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is ...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravi...
Sparse Lower-Upper (LU) Triangular Decomposition is important to many di erent applications, includi...
With the continued development of computation and communication technologies, we are overwhelmed wit...
abstract: With the advent of GPGPU, many applications are being accelerated by using CUDA programing...
Graphs are a common representation in many problem domains, including engineering, finance, medicine...
The main objective of this thesis is to propose new methods for designing high-performance embedded ...
The growing importance of geospatial databases has made it essential to perform complex spatial que...
Thesis (Master) -- University of Cyprus, Faculty of Pure and Applied Sciences, Department of Compute...
There are hundreds of papers on accelerating sparse matrix vector multiplication (SpMV), however, on...
This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (...
Matrix decomposition plays an increasingly significant role in many scientific and engineering appli...
Sparse linear algebra algorithms typically perform poorly on superscalar, general-purpose processors...
Discrete Event Simulation on GPUs employing parallel heap data structure is the focus of this thesis...