Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates repeatead isomorphic instruction sequences, it enables SLP to vectorize more code. However, most production compilers apply these optimizations independently and uncoordinated. Unrolling is commonly tuned to avoid code bloat, not maximizing the potential for vectorization, leading to missed vectorization opportunities. We are proposing VALU, a novel loop unrolling heuristic that takes vectorization into account when making unrolling decisions. Our heuristi...
We introduce Approximate Unrolling, a loop optimization that reduces execution time and energy consu...
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabili...
In order to improve the accuracy of loop unrolling factor in the compiler, we propose a loop unrolli...
Abstract—SIMD vectors are widely adopted in modern general purpose processors as they can boost perf...
The development of embedded applications typically faces memory and/or execution time con-straints. ...
Newer architectures continue to expand vector sizes and increase the different number of vec-tor ins...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
In order to deliver the promise of MooreÂs Law to the enduser, compilers must make decisions that ar...
Vectorization support in hardware continues to expand and grow as we still continue on superscalar a...
Loops in programs are the source of many optimizations for improv-ing program performance, particula...
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabili...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
Vectorization support in hardware continues to expand and grow as well we still continue on supersca...
It is well-known that, to optimize a program for speed-up, efforts should be focused on the regions ...
Compilers base many critical decisions on abstracted architectural models. While recent research has...
We introduce Approximate Unrolling, a loop optimization that reduces execution time and energy consu...
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabili...
In order to improve the accuracy of loop unrolling factor in the compiler, we propose a loop unrolli...
Abstract—SIMD vectors are widely adopted in modern general purpose processors as they can boost perf...
The development of embedded applications typically faces memory and/or execution time con-straints. ...
Newer architectures continue to expand vector sizes and increase the different number of vec-tor ins...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
In order to deliver the promise of MooreÂs Law to the enduser, compilers must make decisions that ar...
Vectorization support in hardware continues to expand and grow as we still continue on superscalar a...
Loops in programs are the source of many optimizations for improv-ing program performance, particula...
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabili...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
Vectorization support in hardware continues to expand and grow as well we still continue on supersca...
It is well-known that, to optimize a program for speed-up, efforts should be focused on the regions ...
Compilers base many critical decisions on abstracted architectural models. While recent research has...
We introduce Approximate Unrolling, a loop optimization that reduces execution time and energy consu...
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabili...
In order to improve the accuracy of loop unrolling factor in the compiler, we propose a loop unrolli...