Abstract—FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between dif-ferent accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asyn-chr...
Abstract—Energy efficiency has emerged as one of the key performance metrics in computing. In this w...
In today's algorithms for sound localization techniques, matrix calculations are ubiquitous. Therefo...
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and ef...
Part 4: Architecture and HardwareInternational audienceMatrix computing plays a vital role in many s...
Matrix multiplication is required for a wide variety of applications, including data mining, linear ...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
To solve the computational complexity and time-consuming problem of large matrix multiplication, thi...
International audienceIn hw/sw co-design FPGAs are being used in order to accelerate existing soluti...
We introduce a 64-bit ANSI/IEEE Std 754-1985 floating point design of a hardware matrix multiplier o...
Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown th...
We see that in most computers and applications the CPU is taxed, first and foremost, before other pi...
This paper describes the implementation of a floating-point matrix-vector multiplication on a reconf...
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright...
Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific r...
Abstract—Energy efficiency has emerged as one of the key performance metrics in computing. In this w...
In today's algorithms for sound localization techniques, matrix calculations are ubiquitous. Therefo...
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and ef...
Part 4: Architecture and HardwareInternational audienceMatrix computing plays a vital role in many s...
Matrix multiplication is required for a wide variety of applications, including data mining, linear ...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
To solve the computational complexity and time-consuming problem of large matrix multiplication, thi...
International audienceIn hw/sw co-design FPGAs are being used in order to accelerate existing soluti...
We introduce a 64-bit ANSI/IEEE Std 754-1985 floating point design of a hardware matrix multiplier o...
Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown th...
We see that in most computers and applications the CPU is taxed, first and foremost, before other pi...
This paper describes the implementation of a floating-point matrix-vector multiplication on a reconf...
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright...
Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific r...
Abstract—Energy efficiency has emerged as one of the key performance metrics in computing. In this w...
In today's algorithms for sound localization techniques, matrix calculations are ubiquitous. Therefo...
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and ef...