This paper describes an approach to performance optimization using modified macro dataflow graphs, which contain nodes representing the loops and data involved in the stencil computation. The targeted applications include existing scientific applications that contain a series of stencil computations that share data, i.e. loop chains. The performance of stencil applications can be improved by modifying the execution schedules. However, modern architectures are increasingly constrained by the memory subsystem bandwidth. To fully realize the benefits of the schedule changes for improved locality, temporary storage allocation must also be minimized. We present a macro dataflow graph variant that includes dataset nodes, a cost model that quantif...
Typical parallelization approaches such as OpenMP and CUDA provide constructs for parallelizing and ...
Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the node...
This paper describes a method of analysis for detecting and minimizing memory latency using a direct...
This paper describes an approach to performance optimization using modified macro dataflow graphs, w...
Science and Engineering advancements depend more and more on computational simulations. These simula...
This research proposes an intermediate compiler representation designed for optimization, with an em...
This paper minimizes the buffer size and the buffer memory management performance overhead for a syn...
: Functional or Control parallelism is an effective way to increase speedups in Multicomputers. Prog...
Polyhedral techniques for program transformation are now used in several proprietary and open source...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
It is now widely recognized that increased levels of parallelism are a necessary condition for impro...
The macro-dataflow model of execution has been used in scheduling heuristics for directed acyclic gr...
Dataflow computing model is a simple yet powerful mechanism for constructing distributed visualizati...
Modern parallel programming models perform their best under the particular patterns they are tuned t...
A mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of ...
Typical parallelization approaches such as OpenMP and CUDA provide constructs for parallelizing and ...
Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the node...
This paper describes a method of analysis for detecting and minimizing memory latency using a direct...
This paper describes an approach to performance optimization using modified macro dataflow graphs, w...
Science and Engineering advancements depend more and more on computational simulations. These simula...
This research proposes an intermediate compiler representation designed for optimization, with an em...
This paper minimizes the buffer size and the buffer memory management performance overhead for a syn...
: Functional or Control parallelism is an effective way to increase speedups in Multicomputers. Prog...
Polyhedral techniques for program transformation are now used in several proprietary and open source...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
It is now widely recognized that increased levels of parallelism are a necessary condition for impro...
The macro-dataflow model of execution has been used in scheduling heuristics for directed acyclic gr...
Dataflow computing model is a simple yet powerful mechanism for constructing distributed visualizati...
Modern parallel programming models perform their best under the particular patterns they are tuned t...
A mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of ...
Typical parallelization approaches such as OpenMP and CUDA provide constructs for parallelizing and ...
Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the node...
This paper describes a method of analysis for detecting and minimizing memory latency using a direct...