Data prefetching effectively reduces the negative effects of long load latencies on the performance of modern processors. Hardware prefetchers employ hardware structures to predict future memory addresses based on previous patterns. Thread-based prefetchers use portions of the actual program code to determine future load addresses for prediction. In this paper, we combine both of these techniques to address the memory performance of pointer-based applications. We combine a thread-based prefetcher, based on speculative precomputation, with a pointer cache. The pointer cache is a new hardware address predictor that tracks pointer transitions. Previously proposed thread-based prefetchers are limited in how far they can run ahead of the ma...
The “Memory Wall” [1], is the gap in performance between the processor and the main memory. Over the...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculativ...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
The “Memory Wall” [1], is the gap in performance between the processor and the main memory. Over the...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculativ...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
The “Memory Wall” [1], is the gap in performance between the processor and the main memory. Over the...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...