While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching. The idea is to hardware-bind and pr...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
In the last century great progress was achieved in developing processors with extremely high computa...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Row buffer locality is a consequence of programs' inherent spatial locality that the memory system c...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
Due to shared cache contentions and interconnect delays, data prefetching is more critical in allevi...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
In the last century great progress was achieved in developing processors with extremely high computa...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Row buffer locality is a consequence of programs' inherent spatial locality that the memory system c...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
Due to shared cache contentions and interconnect delays, data prefetching is more critical in allevi...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
In the last century great progress was achieved in developing processors with extremely high computa...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...