Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This article develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular memory accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. We then evaluate the extent to which good prefetch i...
In the last century great progress was achieved in developing processors with extremely high computa...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Indirect memory accesses have irregular access patterns that limit the performance of conventional s...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
Source code and benchmarks to implement indirect memory access prefetching in LLVM, including new ex...
In the last century great progress was achieved in developing processors with extremely high computa...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Indirect memory accesses have irregular access patterns that limit the performance of conventional s...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
Source code and benchmarks to implement indirect memory access prefetching in LLVM, including new ex...
In the last century great progress was achieved in developing processors with extremely high computa...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...