Low-precision matrix multiplication has gained significant interest in the research community due to its applicability in the quantized neural network domain. As a result, a multitude of variable precision hardware designs have been proposed since fixed-precision hardware causes under-utilization of the hardware resources due to the low and varying precision in such applications. Bit-serial hardware takes advantage of the frugal nature of bit-serial computations that can operate on only as many bits as necessary. A bit-serial matrix multiplication consists of a summation of weighted binary matrix multiplications. In this work, we study the inherent locality of bit-serial matrix multiplications and propose a locality-aware scheduling algorit...
In standard computing architectures, memory and logic circuits are separated, a feature that slows m...
Accessing the memory efficiently to keep up with the data processing rate is a well known problem in...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the...
Matrix-matrix multiplication is a key computational kernel for numerous applications in science and ...
Matrix-matrix multiplication is a key computational kernel for numerous applications in science and ...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
In this paper we propose no learning based neural networks for serial binary multiplication. We show...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Machine Learning inference requires the multiplication of large, sparse matrices. We argue that dire...
In this paper we demonstrate the practical portability of a simple version of matrix multiplication ...
This article presents a reconfigurable accelerator for REcurrent Neural networks with fine-grained c...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
During the last half-decade, a number of research efforts have centered around developing software f...
In standard computing architectures, memory and logic circuits are separated, a feature that slows m...
Accessing the memory efficiently to keep up with the data processing rate is a well known problem in...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the...
Matrix-matrix multiplication is a key computational kernel for numerous applications in science and ...
Matrix-matrix multiplication is a key computational kernel for numerous applications in science and ...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
In this paper we propose no learning based neural networks for serial binary multiplication. We show...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Machine Learning inference requires the multiplication of large, sparse matrices. We argue that dire...
In this paper we demonstrate the practical portability of a simple version of matrix multiplication ...
This article presents a reconfigurable accelerator for REcurrent Neural networks with fine-grained c...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
During the last half-decade, a number of research efforts have centered around developing software f...
In standard computing architectures, memory and logic circuits are separated, a feature that slows m...
Accessing the memory efficiently to keep up with the data processing rate is a well known problem in...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...