This thesis work aims at implementing the sparse matrix vector multiplication on eight-core Digital Signal Processor (DSP) and giving insights on how to optimize matrix multiplication on DSP to achieve high energy efficiency. We used two sparse matrix formats: the Compressed Sparse Row (CSR) and the Block Compressed Sparse Row (BCSR) formats. We carried out loop unrolling optimization of the naive algorithm. In addition, we implemented the Registerblocked and the Cache-blocked sparse matrix vector multiplications to optimize the naive algorithm. The computational performance improvement with loop unrolling technique was promising (≈12%). With this optimization, we observed a decrease of power usage (0.3 W) when using a matrix size of 600 an...
Part 4: Architecture and HardwareInternational audienceAs a fundamental operation, sparse matrix-vec...
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors ...
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point performance (b...
This thesis work aims at implementing the sparse matrix vector multiplication on eight-core Digital ...
AbstractThe matrix-vector multiplication operation is the kernel of most numerical algorithms.Typica...
The problem of obtaining high computational throughput from sparse matrix multiple--vector multiplic...
AbstractThe sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
Sparse computations are ubiquitous in computational codes, with the sparse matrix-vector (SpMV) mult...
This work is a continuation and augmentation of previous energy studies ofCompressed Sparse eXtended...
The design and implementation of a sparse matrix-matrix multiplication architecture on FPGAs is pres...
Abstract. Sparse matrix-vector multiplication is an important computational kernel that tends to per...
International audienceIn this paper, a new methodology for computing the Dense Matrix Vector Multipl...
The sparse matrix is one of the most important data storage format for large amount of data. Sparse ...
Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studi...
Part 4: Architecture and HardwareInternational audienceAs a fundamental operation, sparse matrix-vec...
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors ...
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point performance (b...
This thesis work aims at implementing the sparse matrix vector multiplication on eight-core Digital ...
AbstractThe matrix-vector multiplication operation is the kernel of most numerical algorithms.Typica...
The problem of obtaining high computational throughput from sparse matrix multiple--vector multiplic...
AbstractThe sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
Sparse computations are ubiquitous in computational codes, with the sparse matrix-vector (SpMV) mult...
This work is a continuation and augmentation of previous energy studies ofCompressed Sparse eXtended...
The design and implementation of a sparse matrix-matrix multiplication architecture on FPGAs is pres...
Abstract. Sparse matrix-vector multiplication is an important computational kernel that tends to per...
International audienceIn this paper, a new methodology for computing the Dense Matrix Vector Multipl...
The sparse matrix is one of the most important data storage format for large amount of data. Sparse ...
Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studi...
Part 4: Architecture and HardwareInternational audienceAs a fundamental operation, sparse matrix-vec...
In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors ...
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point performance (b...