Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we implement runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code RE-optimization). Its performance has been compared with static software prefetching on the SPEC2000 benchmark suite. Runtime cache prefetching shows better performance. On an Itanium 2 based Linux workstation, it can increase performance by more than 20 % over static prefetching on some benchmarks. For benchmarks that do not benefit from prefetching, the runtime optimization system adds only 1%-2 % overhead. We have also collected cache miss profiles to guide stati...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
The large number of cache misses of current applications coupled with the increasing cache miss late...
High performance processors employ hardware data prefetching to reduce the negative performance impa...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
One of the significant issues of processor architectureis to overcome memory latency. Prefetching ca...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
The large number of cache misses of current applications coupled with the increasing cache miss late...
High performance processors employ hardware data prefetching to reduce the negative performance impa...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
One of the significant issues of processor architectureis to overcome memory latency. Prefetching ca...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
The large number of cache misses of current applications coupled with the increasing cache miss late...