Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular access patterns make it difficult to accurately predict the address sufficiently early to mask large cache miss latencies. This paper explores an alternative to predicting prefetch addresses, namely precomputing them. The Dependence Graph Precomputation scheme (DGP) introduced in this paper is a novel approach for dynamically identifying and precomputing the instructions that determine the addresses accessed by those load/store instructions marked as being responsible for most data cache misses. ...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
With the continuing technological trend of ever cheaper and larger memory, most data sets in databas...
This thesis considers two approaches to the design of high-performance computers. In a <I>single pro...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
With the continuing technological trend of ever cheaper and larger memory, most data sets in databas...
This thesis considers two approaches to the design of high-performance computers. In a <I>single pro...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...