The graphics processing unit (GPU) was initially designed for raster-based graphics com- putations, but marked improvements in performance and programmability have generated considerable interest in it as a high-performance computing platform. GPU hardware design places a large number of low-power cores on the one chip, as opposed to multiple high-power chips. This configuration has a powerful effect on locality: data can be processed without the throughput costs of spilling it to buffers in the higher levels of the memory hierarchy. However, throughput-oriented algorithms must now explicitly express and optimize memory access.This dissertation explores the design and deployment of memory-efficient tensor product representations for through...
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multic...
Today's computer systems often contains several different processing units aside from the CPU. Among...
The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to vide...
Tensor algorithms are a rapidly growing field of research with applications in many scientific domai...
To respond to the intense computational load of deep neural networks, a plethora of domain-specific ...
Computational intensive applications such as pattern recognition, and natural language processing, a...
AbstractGPUs have recently attracted our attention as accelerators on a wide variety of algorithms, ...
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Proces...
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient ...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
There has been a surge in the demand for a Domain Specific Architecture due to wide ranging deep lea...
The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger reg...
Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using S...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorith...
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multic...
Today's computer systems often contains several different processing units aside from the CPU. Among...
The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to vide...
Tensor algorithms are a rapidly growing field of research with applications in many scientific domai...
To respond to the intense computational load of deep neural networks, a plethora of domain-specific ...
Computational intensive applications such as pattern recognition, and natural language processing, a...
AbstractGPUs have recently attracted our attention as accelerators on a wide variety of algorithms, ...
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Proces...
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient ...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
There has been a surge in the demand for a Domain Specific Architecture due to wide ranging deep lea...
The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger reg...
Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using S...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorith...
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multic...
Today's computer systems often contains several different processing units aside from the CPU. Among...
The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to vide...