We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations of the general matrix multiplication (GEMM), demonstrating that the same approach can be adapted to deliver high performance for deep learning inference tasks on the AI Engine (AIE) tile embedded in Xilinx Versal platforms. Our experimental results on a Xilinx Versal VCK190 shows an arithmetic throughput close to 70% of the theoretical peak of the AIE tile for 8-bit integer operands and the convolutional layers arising in ResNet-50 v.15+ImageNet
The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra l...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
Caffe is a deep learning framework, originally developed at UC Berkeley and widely used in large-sca...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 ...
Xilinx Versal ACAP is the newest acceleration platform, developed by Xilinx, proposed to enhance the...
The parallel nature of FPGA makes it a promising candidate to accelerate machine learning tasks. The...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
International audienceA lot of recent progress has been made in ultra lowbit quantization, promising...
© 2022 ACM.The convolution layer is the key building block in many neural network designs. Most high...
Efficient implementation of deep neural networks (DNNs) on CPU-based systems is critical owing to th...
Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neura...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
In this master thesis some of the most promising existing frameworks and implementations of deep con...
The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra l...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
Caffe is a deep learning framework, originally developed at UC Berkeley and widely used in large-sca...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 ...
Xilinx Versal ACAP is the newest acceleration platform, developed by Xilinx, proposed to enhance the...
The parallel nature of FPGA makes it a promising candidate to accelerate machine learning tasks. The...
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Perfo...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
International audienceA lot of recent progress has been made in ultra lowbit quantization, promising...
© 2022 ACM.The convolution layer is the key building block in many neural network designs. Most high...
Efficient implementation of deep neural networks (DNNs) on CPU-based systems is critical owing to th...
Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neura...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
In this master thesis some of the most promising existing frameworks and implementations of deep con...
The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra l...
International audienceAttaining the best possible throughput when computing convolutions is a challe...
Caffe is a deep learning framework, originally developed at UC Berkeley and widely used in large-sca...