The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effectively support compilerassisted data prefetching in superscalar processors. In particular, a prefetch buffer is shown to be more effective than increasing the cache dimension in solving the cache pollution problem. All in all, we show that a small data cache with compiler-assisted data prefetching can achieve a performance level close to that of an ideal cache. 1 Introduction Superscalar processors can potentially deliver more than five times speedup over conventional single-issue processors [1]. With the total execution cycle count dramatically...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The performance of superscalar processors is more sensitive to the memory system delay than their si...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
By exploiting ne grain parallelism, superscalar processors can potentially increase the performance ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
By exploiting fine grain parallelism, superscalar processors can potentially increase the performanc...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The performance of superscalar processors is more sensitive to the memory system delay than their si...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
By exploiting ne grain parallelism, superscalar processors can potentially increase the performance ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
By exploiting fine grain parallelism, superscalar processors can potentially increase the performanc...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The speed of processors increases much faster than the memory access time. This makes memory accesse...