One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simul...
(eng) We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that i...
International audienceWith the ever-increasing need for computation of scientific applications, new ...
We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that is not ...
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic o...
Low-precision floating-point arithmetic can be simulated via software by executing each arithmetic o...
The half precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard f...
Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computatio...
We demonstrate tools and methods for proofs about the correctness and numerical accuracy of C progra...
Abstract. Most mathematical formulae are defined in terms of operations on real numbers, but compute...
Floating-point arithmetic is considered an esotoric subject by many people. This is rather surprisin...
Floating-point arithmetic is considered an esotoric subject by many people. This is rather surprisin...
International audienceThis paper presents a multiple-precision binary floating-point library, writte...
The widely implemented and used IEEE-754 Floating-point specification defines a method by which floa...
This handbook is a definitive guide to the effective use of modern floating-point arithmetic, which ...
International audienceFloating-Point (FP) units in processors are generally limited to supporting a ...
(eng) We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that i...
International audienceWith the ever-increasing need for computation of scientific applications, new ...
We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that is not ...
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic o...
Low-precision floating-point arithmetic can be simulated via software by executing each arithmetic o...
The half precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard f...
Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computatio...
We demonstrate tools and methods for proofs about the correctness and numerical accuracy of C progra...
Abstract. Most mathematical formulae are defined in terms of operations on real numbers, but compute...
Floating-point arithmetic is considered an esotoric subject by many people. This is rather surprisin...
Floating-point arithmetic is considered an esotoric subject by many people. This is rather surprisin...
International audienceThis paper presents a multiple-precision binary floating-point library, writte...
The widely implemented and used IEEE-754 Floating-point specification defines a method by which floa...
This handbook is a definitive guide to the effective use of modern floating-point arithmetic, which ...
International audienceFloating-Point (FP) units in processors are generally limited to supporting a ...
(eng) We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that i...
International audienceWith the ever-increasing need for computation of scientific applications, new ...
We introduce an algorithm for multiplying a floating-point number $x$ by a constant $C$ that is not ...