The key common bottleneck in most stencil codes is data movement, and prior research has shown that improving data locality through optimisations that schedule across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through compilers because there are many options, execution paths and data per grid point, many dependent on run-time parameters, and the code is distributed across different compilation units. In this paper, we adapt the data locality improving optimisation called iteration space slicing for use in large OPS applications both in shared-memory and distributed-memory systems, relying on run-time analysis and delayed execution. We evaluate our approach on a number o...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
Spatial computing devices have been shown to significantly accelerate stencil computations, but have...
A widely used class of codes are stencil codes. Their general structure is very simple: data points ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Abstract Performance optimization of stencil computations has beenwidely studied in the literature, ...
Performance optimization of stencil computations has been widely studied in the literature, since th...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Abstract. This paper proposes tiling techniques based on data depen-dencies and not in code structur...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond syst...
The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting p...
In this work, we present Dido, an implicitly parallel domain-specific language (DSL) that captures h...
Abstract—Many scientific applications are organized in a data parallel way: as sequences of parallel...
In the field of scientific computation, loop tiling is an indispensable technique for improving cach...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
Spatial computing devices have been shown to significantly accelerate stencil computations, but have...
A widely used class of codes are stencil codes. Their general structure is very simple: data points ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Abstract Performance optimization of stencil computations has beenwidely studied in the literature, ...
Performance optimization of stencil computations has been widely studied in the literature, since th...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Abstract. This paper proposes tiling techniques based on data depen-dencies and not in code structur...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond syst...
The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting p...
In this work, we present Dido, an implicitly parallel domain-specific language (DSL) that captures h...
Abstract—Many scientific applications are organized in a data parallel way: as sequences of parallel...
In the field of scientific computation, loop tiling is an indispensable technique for improving cach...
Abstract—Increasingly, the main bottleneck limiting performance on emerging multi-core and many-core...
Spatial computing devices have been shown to significantly accelerate stencil computations, but have...
A widely used class of codes are stencil codes. Their general structure is very simple: data points ...