The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algo-rithms and to a more limited extent in optimizing compilers. In this paper, we develop a novel framework utilizing the associa-tivity and commutativity of operations in regular loop computations to enhance register reuse. Stencils represent a particular class of im-portant computations where the optimization framework can be ap-plied to enhance performance. We show how stencil operations can be implemented to better exploit register reuse and reduce load/stores. We develop a multi-dimensional retiming formalism to characterize the space of valid implementations in conjunction with other program transformati...
Article dans revue scientifique avec comité de lecture.First-order languages based on rewrite rules ...
Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...
Most conventional compilers fail to allocate array el-ements to registers because standard data-flow...
International audienceThe freedom to reorder computations involving associative operators has been w...
International audienceRegister allocation is generally considered a practically solved problem. For ...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
Abstract Performance optimization of stencil computations has beenwidely studied in the literature, ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
. In the context of developing a compiler for a Alpha, a functional data-parallel language based on ...
[[abstract]]In this paper, we propose a compilation scheme to analyze and exploit the implicit reuse...
International audienceIncreasingly complex hardware makes the design of effective compilers difficul...
International audienceThe generation of efficient sequential code for synchronous data-flow language...
Article dans revue scientifique avec comité de lecture.First-order languages based on rewrite rules ...
Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...
Most conventional compilers fail to allocate array el-ements to registers because standard data-flow...
International audienceThe freedom to reorder computations involving associative operators has been w...
International audienceRegister allocation is generally considered a practically solved problem. For ...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
Abstract Performance optimization of stencil computations has beenwidely studied in the literature, ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
. In the context of developing a compiler for a Alpha, a functional data-parallel language based on ...
[[abstract]]In this paper, we propose a compilation scheme to analyze and exploit the implicit reuse...
International audienceIncreasingly complex hardware makes the design of effective compilers difficul...
International audienceThe generation of efficient sequential code for synchronous data-flow language...
Article dans revue scientifique avec comité de lecture.First-order languages based on rewrite rules ...
Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...
Most conventional compilers fail to allocate array el-ements to registers because standard data-flow...