This paper presents a comprehensive hardware accelerator architecture of YOLOv3-Tiny targeted for low-cost FPGA with a high frame rate, high accuracy, and low latency. The proposed accelerator implements all YOLO layers in hardware including zero pad layer, convolution layer, leaky ReLU layer, batch normalization layer, max-pooling layer, and up-sampling layer. The architecture is built based on data flow and control flow hybrid architecture. The data preparation and computation process work asynchronously using the data flow paradigm, while the overall governing process is controlled by proposed custom instruction set which adopts the principle of control flow paradigm. The principle of General Matrix Multiplication (GEMM) is adopted to co...
The dissemination of multi-core architectures and the later irruption of massively parallel devices,...
The matrix multiplication is a computationally intensive problem and a prerequisite in various image...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
Matrix multiplication is required for a wide variety of applications, including data mining, linear ...
To solve the computational complexity and time-consuming problem of large matrix multiplication, thi...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific r...
International audienceIn hw/sw co-design FPGAs are being used in order to accelerate existing soluti...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
With diminishing performance improvement from general-purpose processors and reducing cost for prog...
International audienceThis paper presents an FPGA accelerator based on circular buffer unit per orie...
This paper presents architecture for matrix multiplication optimized to be integrated as an accelera...
Part 4: Architecture and HardwareInternational audienceMatrix computing plays a vital role in many s...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Convolutional Neural Network (CNN) has attained high accuracy and it has been widely employed in ima...
The dissemination of multi-core architectures and the later irruption of massively parallel devices,...
The matrix multiplication is a computationally intensive problem and a prerequisite in various image...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
Matrix multiplication is required for a wide variety of applications, including data mining, linear ...
To solve the computational complexity and time-consuming problem of large matrix multiplication, thi...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific r...
International audienceIn hw/sw co-design FPGAs are being used in order to accelerate existing soluti...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
With diminishing performance improvement from general-purpose processors and reducing cost for prog...
International audienceThis paper presents an FPGA accelerator based on circular buffer unit per orie...
This paper presents architecture for matrix multiplication optimized to be integrated as an accelera...
Part 4: Architecture and HardwareInternational audienceMatrix computing plays a vital role in many s...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Convolutional Neural Network (CNN) has attained high accuracy and it has been widely employed in ima...
The dissemination of multi-core architectures and the later irruption of massively parallel devices,...
The matrix multiplication is a computationally intensive problem and a prerequisite in various image...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...