Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive pro-cessing within image/audio recognition and matching systems. We propose a novel method to scale the energy and process-ing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the nu...
In this paper, a chip that performs real-time image convolutions with programmable kernels of arbit...
In the last few years, dynamically configurable approximate multipliers have been explored to tune t...
The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra l...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
121 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2003.We next apply the above findi...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
In this paper we analyze 1 the use of Decision Tree Grafting, Blocking and Loop Unfolding to improve...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations...
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 ...
Modern neuromorphic deep learning techniques, as well as unsupervised techniques like the locally co...
In this paper, a chip that performs real-time image convolutions with programmable kernels of arbit...
In the last few years, dynamically configurable approximate multipliers have been explored to tune t...
The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra l...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
121 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2003.We next apply the above findi...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
In this paper we analyze 1 the use of Decision Tree Grafting, Blocking and Loop Unfolding to improve...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations...
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 ...
Modern neuromorphic deep learning techniques, as well as unsupervised techniques like the locally co...
In this paper, a chip that performs real-time image convolutions with programmable kernels of arbit...
In the last few years, dynamically configurable approximate multipliers have been explored to tune t...
The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image...