An emerging trend in processor architecture seems to indicate the doubling of the number of cores per chip every two years with same or decreased clock speed. Of particular interest to this thesis is the class of many-core processors, which are becoming more attractive due to their high performance, low cost, and low power consumption. The main goal of this dissertation is to develop optimization techniques for mapping algorithms and applications onto CUDA GPUs and CPU-GPU heterogeneous platforms. The Fast Fourier transform (FFT) constitutes a fundamental tool in computational science and engineering, and hence a GPU-optimized implementation is of paramount importance. We first study the mapping of the 3D FFT onto the recent, CUDA GPUs an...
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses thr...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD C...
Using two full applications with different characteristics, this thesis explores the performance and...
AbstractGraphics processor units (GPUs) have evolved to handle throughput oriented workloads where a...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multic...
Li, XiaomingGraphic Processing Units (GPU) has been proved to be a promising platform to accelerate ...
Computing on graphics processors is maybe one of the most important developments in computational sc...
Abstract Optimization algorithms are becoming increasingly more important in many areas, such as fin...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
The Fourier Transform is one of the most influential mathematical equations of our time. The Discret...
As an open, royalty-free framework for writing programs that execute across heterogeneous platforms,...
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of ve...
Computers almost always contain one or more central processing units (CPU), each of which processes ...
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses thr...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD C...
Using two full applications with different characteristics, this thesis explores the performance and...
AbstractGraphics processor units (GPUs) have evolved to handle throughput oriented workloads where a...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multic...
Li, XiaomingGraphic Processing Units (GPU) has been proved to be a promising platform to accelerate ...
Computing on graphics processors is maybe one of the most important developments in computational sc...
Abstract Optimization algorithms are becoming increasingly more important in many areas, such as fin...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
The Fourier Transform is one of the most influential mathematical equations of our time. The Discret...
As an open, royalty-free framework for writing programs that execute across heterogeneous platforms,...
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of ve...
Computers almost always contain one or more central processing units (CPU), each of which processes ...
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses thr...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD C...