A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, when the memory hierarchy (locality and bandwidth) is not the dominant bottleneck. We consider two real applications from computational biology and from cryptanalysis, characterized by long sequences of dependent instructions, irregular control-flow and intricate scalar and array dependence patterns. Although these applications exhibit excellent memory locality and branch-prediction behavior, state-ofthe -art loop transformations and back-end optimizations are unable to exploit much instruction-leve...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
Processors have become faster at a much quicker rate than memory access time, creating wide gap betw...
Abstract- Instruction-level redundancy is an effective scheme to reduce the susceptibility of microp...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Since the 60s to the present, the evolution of supercomputers faced three revolutions : (i) the ar...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
With the rise of chip-multiprocessors, the problem of parallelizing general-purpose programs has onc...
Irregular applications have frequent data-dependent memory accesses and control flow. They arise in ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
We detail an algorithm implemented in the R-Stream com-piler1 to perform controlled array expansion ...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggr...
Specialized accelerators are increasingly attractive solutions to continue expected generational per...
Our goal is to dramatically increase the performance of uniprocessors through the exploitation of in...
We develop a technique for extracting parallelism from ordinary (sequential) programs. The technique...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
Processors have become faster at a much quicker rate than memory access time, creating wide gap betw...
Abstract- Instruction-level redundancy is an effective scheme to reduce the susceptibility of microp...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Since the 60s to the present, the evolution of supercomputers faced three revolutions : (i) the ar...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
With the rise of chip-multiprocessors, the problem of parallelizing general-purpose programs has onc...
Irregular applications have frequent data-dependent memory accesses and control flow. They arise in ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
We detail an algorithm implemented in the R-Stream com-piler1 to perform controlled array expansion ...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggr...
Specialized accelerators are increasingly attractive solutions to continue expected generational per...
Our goal is to dramatically increase the performance of uniprocessors through the exploitation of in...
We develop a technique for extracting parallelism from ordinary (sequential) programs. The technique...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
Processors have become faster at a much quicker rate than memory access time, creating wide gap betw...
Abstract- Instruction-level redundancy is an effective scheme to reduce the susceptibility of microp...