Abstract—Hardware specialization is an increasingly com-mon technique to enable improved performance and energy ef-ficiency in spite of the diminished benefits of technology scal-ing. This paper proposes a new approach called explicit loop specialization (XLOOPS) based on the idea of elegantly en-coding inter-iteration loop dependence patterns in the instruc-tion set. XLOOPS supports a variety of inter-iteration data-and control-dependence patterns for both single and nested loops. The XLOOPS hardware/software abstraction requires only lightweight changes to a general-purpose compiler to gen-erate XLOOPS binaries and enables executing these binaries on: (1) traditional microarchitectures with minimal performance impact, (2) specialized micr...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
Loop-nests in most scientific applications perform repetitive operations on array(s) and account for...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
Hardware specialization is becoming an increasingly com-mon technique to enable improved performance...
Since the invention of the microprocessor in 1971, the computational capacity of the microprocessor ...
An optimizing compiler cannot generate one best code pattern for all input data. There is no ‘one op...
Graduation date: 2009General purpose computer systems have seen increased performance potential thro...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
Serious physical design issues are breaking down traditional abstractions in computer architec- ture...
textThe level of Thread-Level Parallelism (TLP), Instruction-Level Parallelism (ILP), and Memory-Lev...
Control divergence poses many problems in parallelizing loops. While predicated execution is commonl...
Loops are the main source of parallelism in scientific programs. Hence, several techniques were dev...
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
International audienceAn optimizing compiler cannot generate one best code pattern for all input dat...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
Loop-nests in most scientific applications perform repetitive operations on array(s) and account for...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
Hardware specialization is becoming an increasingly com-mon technique to enable improved performance...
Since the invention of the microprocessor in 1971, the computational capacity of the microprocessor ...
An optimizing compiler cannot generate one best code pattern for all input data. There is no ‘one op...
Graduation date: 2009General purpose computer systems have seen increased performance potential thro...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
Serious physical design issues are breaking down traditional abstractions in computer architec- ture...
textThe level of Thread-Level Parallelism (TLP), Instruction-Level Parallelism (ILP), and Memory-Lev...
Control divergence poses many problems in parallelizing loops. While predicated execution is commonl...
Loops are the main source of parallelism in scientific programs. Hence, several techniques were dev...
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
International audienceAn optimizing compiler cannot generate one best code pattern for all input dat...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
Loop-nests in most scientific applications perform repetitive operations on array(s) and account for...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...