Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip pin bandwidth. Purely software based prefetching techniques tend to increase this contention, leading to degradation in performance. In some cases, prefetches can become harmful by kicking out useful data from the shared cache whose next usage is earlier than the prefetched data, and the fraction of such harmful prefetches usually increases when we increase the number of cores used for executing a multi-threaded application code. In this paper, we propose two complementary techniques to addre...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
I/O prefetching has been employed in the past as one of the mech- anisms to hide large disk latencie...
Modern processors attempt to overcome increasing memory latencies by anticipating future references ...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
Memory stalls are a significant source of performance degradation in modern processors. Data prefetc...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
I/O prefetching has been employed in the past as one of the mech- anisms to hide large disk latencie...
Modern processors attempt to overcome increasing memory latencies by anticipating future references ...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
Memory stalls are a significant source of performance degradation in modern processors. Data prefetc...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...