The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive ...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
National audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-point co...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
Achieving high-performance while reducing power consumption is the key question as tech-nology scali...
International audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-poi...
Abstract. Mr. Goto wrote a code to improve GEMM greatly as once the fastest program in the world. In...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
During the last half-decade, a number of research efforts have centered around developing software f...
Abstract—Recent developments in computational sciences, in-volving both hardware and software, allow...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
National audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-point co...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...
Abstract—Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CON...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks ha...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
Achieving high-performance while reducing power consumption is the key question as tech-nology scali...
International audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-poi...
Abstract. Mr. Goto wrote a code to improve GEMM greatly as once the fastest program in the world. In...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
During the last half-decade, a number of research efforts have centered around developing software f...
Abstract—Recent developments in computational sciences, in-volving both hardware and software, allow...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
National audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-point co...
Edge computing brings artificial intelligence algorithms and graphics processing units closer to dat...