Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed considerable success in array-based numeric codes, its potential in pointer-based applications has remained largely unexplored. This paper investigates compilerbased prefetching for pointer-based applications---in particular, those containing recursive data structures. We identify the fundamental problem in prefetching pointer-based data structures and propose a guideline for devising successful prefetching schemes. Based on this guideline, we design three prefetching schemes, we automate the most widely applicable scheme (greedy prefetch...
The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many ex...
We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculativ...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
In recent years, processor speed has become increasingly faster than memory speed. One technique for...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
In recent years, processor speed has become increasingly faster than memory speed. One technique for...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many ex...
We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculativ...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
In recent years, processor speed has become increasingly faster than memory speed. One technique for...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
In recent years, processor speed has become increasingly faster than memory speed. One technique for...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many ex...
We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculativ...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...