International audienceAttaining the best possible throughput when computing convolutions is a challenge for signal and image processing systems, be they HPC (High-Performance Computing) machines or embedded real-time targets. This importance is highlighted by the numerous methods and implementations available, often optimized for particular settings: small batched kernels or very large kernels, for example. In the meantime, GPUs (Graphics Processing Units) have become a first-class architecture for real-time and embedded processing. The power offered by those chips stems from their parallel nature, and this advantage has been exploited for convolutions in several libraries. Even more recently, the introduction of tensor cores on NVIDIA GPUs...
The main contribution of this paper is to show efficient implementations of the convolution-pooling ...
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsibl...
We present an implementation of the overlap-and-save method, a method for the convolution of very lo...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
With the increasing sophistication of image processing algorithms, and because of its low computatio...
In this paper, we describe our work on providing a generic yet optimized GPU (CUDA/OpenCL) implement...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated ext...
Convolution layers are useful for improving the accuracy of neural networks. In the case of networks...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
filtering. • These kernels have a large amount of data-level parallelism. • All these applications a...
International audienceWe designed a new algorithm for 2D-convolutions that uses tensor cores. Contra...
This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Fi...
I present a novel model for performing 2D Gabor ltering for images on the GPU. Ideally, the model is...
2D convolution is a staple of digital image processing. The advent of large format imagers makes it ...
The main contribution of this paper is to show efficient implementations of the convolution-pooling ...
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsibl...
We present an implementation of the overlap-and-save method, a method for the convolution of very lo...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
With the increasing sophistication of image processing algorithms, and because of its low computatio...
In this paper, we describe our work on providing a generic yet optimized GPU (CUDA/OpenCL) implement...
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in man...
The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated ext...
Convolution layers are useful for improving the accuracy of neural networks. In the case of networks...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
filtering. • These kernels have a large amount of data-level parallelism. • All these applications a...
International audienceWe designed a new algorithm for 2D-convolutions that uses tensor cores. Contra...
This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Fi...
I present a novel model for performing 2D Gabor ltering for images on the GPU. Ideally, the model is...
2D convolution is a staple of digital image processing. The advent of large format imagers makes it ...
The main contribution of this paper is to show efficient implementations of the convolution-pooling ...
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsibl...
We present an implementation of the overlap-and-save method, a method for the convolution of very lo...