AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core processor's performance, data prefetching is an effective technique to hide data access latency. This paper proposes a new hardware prefetching technique based on Future execution that integrates Runahead execution. Future execution uses one core of CMP to prefetch date for a thread running on another core. Runahead execution is an out of order execution technique that allows microprocessors pre-process instructions during cache miss cycles instead of stalling. We named the prefetching technique Future-runahead execution, and experiment result reveals that the relative execution time of Future-runahead execution tested by SPEC2000 program reduced b...
In the last century great progress was achieved in developing processors with extremely high computa...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Memory stalls are a significant source of performance degradation in modern processors. Data prefetc...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data prefetching has been considered an effective way to mask data access latency caused by cache mi...
Runahead execution improves processor performance by accurately prefetching long-latency memory acce...
In the last century great progress was achieved in developing processors with extremely high computa...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Memory stalls are a significant source of performance degradation in modern processors. Data prefetc...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data prefetching has been considered an effective way to mask data access latency caused by cache mi...
Runahead execution improves processor performance by accurately prefetching long-latency memory acce...
In the last century great progress was achieved in developing processors with extremely high computa...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...