Cache Oblivious Parallelograms in Iterative Stencil Computations

Strzodka, R.
Shaheen, M.
Pajak, D.
Seidel, H.

Open link

Publication date

January 2010

DOI

10.1145/1810085.1810096

Publisher

Association for Computing Machinery (ACM)

Abstract

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results. The performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regula...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Cache Oblivious Parallelograms in Iterative Stencil Computations

Abstract

Extracted data

Cache Oblivious Parallelograms in Iterative Stencil Computations

Abstract

Extracted data

Related items

Related items