This paper introduces the idea of using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: the correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide usability, as it can effectively prefetch even for irregular applications. Finally...
Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffec...
Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems....
In the last century great progress was achieved in developing processors with extremely high computa...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
As the gap between processor performance and memory performance continues to broaden with time, tech...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffec...
Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems....
In the last century great progress was achieved in developing processors with extremely high computa...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
As the gap between processor performance and memory performance continues to broaden with time, tech...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffec...
Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems....
In the last century great progress was achieved in developing processors with extremely high computa...