Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the superlinear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U25...
This article addresses the development of complex, heavily parameterized and flexible operators to b...
FPGAs are increasingly being used in the high performance and scientific computing community to impl...
Floating point arithmetic is a common requirement in signal processing, image processing and real ti...
Numerical codes that require arbitrary precision floating point (APFP) numbers for their core comput...
High speed computation is the need of today’s generation of Processors. To accomplish this maj...
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computation...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
We see that in most computers and applications the CPU is taxed, first and foremost, before other pi...
We disclose hardware (HW) intrinsic CPU or DSP instructions architecture and microarchitecture that ...
Many scenarios demand a high processing power often combined with a limited energy budget. A way to ...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
Many computationally intensive scientific applications involve repetitive floating point operations ...
International audienceThis paper presents some work in progress on the development of fast and accur...
International audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-poi...
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and ef...
This article addresses the development of complex, heavily parameterized and flexible operators to b...
FPGAs are increasingly being used in the high performance and scientific computing community to impl...
Floating point arithmetic is a common requirement in signal processing, image processing and real ti...
Numerical codes that require arbitrary precision floating point (APFP) numbers for their core comput...
High speed computation is the need of today’s generation of Processors. To accomplish this maj...
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computation...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
We see that in most computers and applications the CPU is taxed, first and foremost, before other pi...
We disclose hardware (HW) intrinsic CPU or DSP instructions architecture and microarchitecture that ...
Many scenarios demand a high processing power often combined with a limited energy budget. A way to ...
Abstract — In this paper, we introduce a scalable macro-pipelined architecture to perform floating p...
Many computationally intensive scientific applications involve repetitive floating point operations ...
International audienceThis paper presents some work in progress on the development of fast and accur...
International audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-poi...
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and ef...
This article addresses the development of complex, heavily parameterized and flexible operators to b...
FPGAs are increasingly being used in the high performance and scientific computing community to impl...
Floating point arithmetic is a common requirement in signal processing, image processing and real ti...