We demonstrate in this work the potential effectiveness of a source-to-source framework for automatically optimizing a sub-class of affine programs on the Intel Many Integrated Core Architecture. Data locality is achieved through complex and automated loop trans-formations within the polyhedral framework to enable parallel tiling, and the resulting tiles are processed by an aggressive automatic SIMD vector code generator. We evaluate the effectiveness of this approach on tensor contraction kernels. We show a mean improvement of 1.86 × over existing compiler techniques for single core performance, and combined with automatic parallelization we achieve 14.56 × the performance of Intel’s ICC Compiler on MIC.
SIMD hardware accelerators o er an alternative to manycores when energy consumption and performance ...
The polyhedral model for loop parallelization has proved to be an effective tool for ad-vanced optim...
Recent advances in polyhedral compilation technology have made it feasible to automatically transfor...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Data locality and parallelism are critical optimization objectives for performance on modern multi-c...
Recent extensions to the Intel ® Architecture feature the SIMD technique to enhance the performance ...
On modern architectures, a missed optimization can translate into performance degradations reaching ...
In the last years, there has been much effort in commercial compilers (icc, gcc) to exploit efficien...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
International audienceStencil computation represents an important numerical kernel in scientific com...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
To this day, polyhedral optimizing compilers use either extremely rigid (but accurate) cost models, ...
The goal of this dissertation is to give programmers the ability to achieve high performance by focu...
Communicated by Guest Editors Our aim is to apply program transformations to stencil codes in order ...
SIMD hardware accelerators o er an alternative to manycores when energy consumption and performance ...
The polyhedral model for loop parallelization has proved to be an effective tool for ad-vanced optim...
Recent advances in polyhedral compilation technology have made it feasible to automatically transfor...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Data locality and parallelism are critical optimization objectives for performance on modern multi-c...
Recent extensions to the Intel ® Architecture feature the SIMD technique to enhance the performance ...
On modern architectures, a missed optimization can translate into performance degradations reaching ...
In the last years, there has been much effort in commercial compilers (icc, gcc) to exploit efficien...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
International audienceStencil computation represents an important numerical kernel in scientific com...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
To this day, polyhedral optimizing compilers use either extremely rigid (but accurate) cost models, ...
The goal of this dissertation is to give programmers the ability to achieve high performance by focu...
Communicated by Guest Editors Our aim is to apply program transformations to stencil codes in order ...
SIMD hardware accelerators o er an alternative to manycores when energy consumption and performance ...
The polyhedral model for loop parallelization has proved to be an effective tool for ad-vanced optim...
Recent advances in polyhedral compilation technology have made it feasible to automatically transfor...