We describe a simple hardware device, the Indirect Reference Buffer , that can be used to speculatively prefetch pointer-linked, sparse matrix, or dense matrix data structures into primary data caches. The indirect reference buffer (IRB) identifies recurrent patterns of memory access in such computations, and uses these patterns to prefetch data that it anticipates will be used shortly by the processor, thereby reducing the cache miss penalty associated with such references. Previously described schemes (both hardware and software) have focused on prefetching regular array references only, which makes them inadequate for computations that generate complex memory access patterns. The IRB rectifies these deficiencies. In addition, because of ...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Indirect memory accesses have irregular access patterns that limit the performance of conventional s...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
In the last century great progress was achieved in developing processors with extremely high computa...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The large number of cache misses of current applications coupled with the increasing cache miss late...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Indirect memory accesses have irregular access patterns that limit the performance of conventional s...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
In the last century great progress was achieved in developing processors with extremely high computa...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The large number of cache misses of current applications coupled with the increasing cache miss late...