© 2020 IEEE. Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST _BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than t...
AbstractThe prefix-sum operation, which returns all prefix sums on a sequence of numbers, plays an i...
Improved error signal of the backpropagation (BP) algorithm on single processors has shown a tremend...
International audienceModern high performance computation (HPC) performs a huge amount of floating p...
The problem of exactly summing n floating-point numbers is a fundamental problem that has many appli...
On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, espec...
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms to...
Abstract: "Experienced algorithm designers rely heavily on a set of building blocks and on the tools...
National audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-point co...
Nowadays, parallel computing is ubiquitous in several application fields, both in engineering and sc...
Abstract:- We are interested in solving the prefix problem of n inputs using p < n processors on ...
Abstract. Given a vector pi of floating-point numbers with exact sum s, we present a new algorithm w...
Parallel prefix sums algorithms are one of the simplest and most useful building blocks for construc...
Floating-point arithmetic is notoriously non-associative due to the limited precision representation...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse...
AbstractThe prefix-sum operation, which returns all prefix sums on a sequence of numbers, plays an i...
Improved error signal of the backpropagation (BP) algorithm on single processors has shown a tremend...
International audienceModern high performance computation (HPC) performs a huge amount of floating p...
The problem of exactly summing n floating-point numbers is a fundamental problem that has many appli...
On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, espec...
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms to...
Abstract: "Experienced algorithm designers rely heavily on a set of building blocks and on the tools...
National audienceOn modern multi-core, many-core, and heterogeneous architectures, floating-point co...
Nowadays, parallel computing is ubiquitous in several application fields, both in engineering and sc...
Abstract:- We are interested in solving the prefix problem of n inputs using p < n processors on ...
Abstract. Given a vector pi of floating-point numbers with exact sum s, we present a new algorithm w...
Parallel prefix sums algorithms are one of the simplest and most useful building blocks for construc...
Floating-point arithmetic is notoriously non-associative due to the limited precision representation...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse...
AbstractThe prefix-sum operation, which returns all prefix sums on a sequence of numbers, plays an i...
Improved error signal of the backpropagation (BP) algorithm on single processors has shown a tremend...
International audienceModern high performance computation (HPC) performs a huge amount of floating p...