We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared memory based FFT we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs
This paper focuses on the use of GPGPU (General-Purpose computing on Graphics Processing Units) for ...
Cross-correlation is a commonly used tool in the field of signal processing, with ap- plications in ...
Real time convolution has many applications among others simulating room reverberation in audio proc...
We present an implementation of the overlap-and-save method, a method for the convolution of very lo...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
Abstract—In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on...
The main contribution of this paper is to show efficient implementations of the convolution-pooling ...
Rapid development of modern central processing units (CPUs) and graphics processing units (GPUs) has...
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-...
The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated ext...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
This paper focuses on the use of GPGPU (General- Purpose computing on Graphics Processing Units) for...
This paper focuses on the use of GPGPU (General-Purpose computing on Graphics Processing Units) for ...
Cross-correlation is a commonly used tool in the field of signal processing, with ap- plications in ...
Real time convolution has many applications among others simulating room reverberation in audio proc...
We present an implementation of the overlap-and-save method, a method for the convolution of very lo...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
The fast Fourier transform (FFT) plays an important role in digital signal processing (DSP)...
Abstract—In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on...
The main contribution of this paper is to show efficient implementations of the convolution-pooling ...
Rapid development of modern central processing units (CPUs) and graphics processing units (GPUs) has...
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-...
The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated ext...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
This paper focuses on the use of GPGPU (General- Purpose computing on Graphics Processing Units) for...
This paper focuses on the use of GPGPU (General-Purpose computing on Graphics Processing Units) for ...
Cross-correlation is a commonly used tool in the field of signal processing, with ap- plications in ...
Real time convolution has many applications among others simulating room reverberation in audio proc...