Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiprocessors (CMPs). Through deep analysis of SPEC2000 applications, we find that a part of the nearby data memory references often exhibit highly-repeated patterns with long, but equal block reuse distance. These references can form a coterminous group (CG). Coterminous locality is introduced as that when a member in a CG is referenced, the remaining members will likely be referenced in the near future. Based on the coterminous locality behavior, we implement a novel CG data prefetcher on CMPs. Performance evaluations show that the proposed prefetcher can accurately cove...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Row buffer locality is a consequence of programs' inherent spatial locality that the memory system c...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system ...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
The memory system remains a bottleneck in modern computer systems. Traditionally, designers have use...
Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Row buffer locality is a consequence of programs' inherent spatial locality that the memory system c...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system ...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
The memory system remains a bottleneck in modern computer systems. Traditionally, designers have use...
Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...