Control divergence poses many problems in parallelizing loops. While predicated execution is commonly used to convert control dependence into data dependence, it often incurs high overhead because it allocates resources equally for both branches of a conditional statement regardless of their execution frequencies. For those loops with unbalanced conditionals, we propose a software transformation that divides a loop into two or three smaller loops so that the condition is evaluated only in the first loop, while the less frequent branch is executed in the second loop in a way that is much more efficient than in the original loop. To reduce the overhead of extra data transfer caused by the loop fission, we also present a hardware extension for...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
Predication is an essential technique to accelerate kernels with control flow on CGRAs. While state-...
Pipelining algorithms are typically concerned with improving only the steady-state performance, or t...
Nested loops represent a significant portion of application runtime in multimedia and DSP applicatio...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations ...
control dependences, recurrences, parallelism, control height reduction, back-substitution, blocked ...
We present a novel loop transformation technique, particularly well suited for optimizing embedded c...
abstract: Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achievin...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations cap...
International audienceThis article studies an important open problem in backend compilation regardin...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
Predication is an essential technique to accelerate kernels with control flow on CGRAs. While state-...
Pipelining algorithms are typically concerned with improving only the steady-state performance, or t...
Nested loops represent a significant portion of application runtime in multimedia and DSP applicatio...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations ...
control dependences, recurrences, parallelism, control height reduction, back-substitution, blocked ...
We present a novel loop transformation technique, particularly well suited for optimizing embedded c...
abstract: Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achievin...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations cap...
International audienceThis article studies an important open problem in backend compilation regardin...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...