Pointer-chasing applications tend to traverse composed data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This paper presents multi-chain prefetching, a technique that utilizes offline analysis and a hardware prefetch engine to prefetch multiple independent pointer chains simultaneously, thus exploiting interchain memory parallelism for the purpose of memory latency tolerance. This paper makes three contributions. First, we introduce a scheduling algorithm that identifies independent pointer chains in pointer-chasing codes and computes a prefetch...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
Recent advances in integrating logic and DRAM on the same chip potentially open up new avenues for a...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group o...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
Recent advances in integrating logic and DRAM on the same chip potentially open up new avenues for a...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group o...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
Recent advances in integrating logic and DRAM on the same chip potentially open up new avenues for a...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...