Abstract—Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point ac-cumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are “mostly ” associative. This observa-tion can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computa...