Abstract We present and evaluate a cache oblivious algorithm for stencil computa-tions, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an “ideal cache ” of size Z, our algo-rithm saves a factor of (Z1/n) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy. We evaluate our algorithm in terms of the number of cache misses, and demonstrate that the memory behavior agrees with our theoretical predictions. Our experimental eval-uation is based on a finite-difference solution of a heat diffusion problem, as well as a Gauss-Seidel iteration and a 2-dimensional LBMHD program, both reformulated as ...
Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of ...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Application codes reliably achieve performance far less than the advertised capabilities of existing...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond syst...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
In this work, we study the cache-oblivious computation model, which is inspired by the behaviour of ...
Cache-oblivious algorithms are well understood when the cache size remains constant. Recently variab...
One important bottleneck when visualizing large data sets is the data transfer between processor and...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
We present a novel method for computing cache-oblivious layouts of large meshes that improve the per...
We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming m...
Memory efficiency and locality have substantial impact on the performance of programs, particularly ...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract The class of stencil programs involves repeatedly updating elements of array...
Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and ...
Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of ...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Application codes reliably achieve performance far less than the advertised capabilities of existing...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond syst...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
In this work, we study the cache-oblivious computation model, which is inspired by the behaviour of ...
Cache-oblivious algorithms are well understood when the cache size remains constant. Recently variab...
One important bottleneck when visualizing large data sets is the data transfer between processor and...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
We present a novel method for computing cache-oblivious layouts of large meshes that improve the per...
We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming m...
Memory efficiency and locality have substantial impact on the performance of programs, particularly ...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract The class of stencil programs involves repeatedly updating elements of array...
Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and ...
Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of ...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Application codes reliably achieve performance far less than the advertised capabilities of existing...