Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures be-come increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution--essentially combined act of speculative address generation and prefetching--to accelerate the main thread. In this paper, we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are ~'pically difficult to prefetch. Compa...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group o...
Abstract. Threads experiencing long-latency loads on a simultaneous multith-reading (SMT) processor ...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
Threads experiencing long-latency loads on a simultaneous multithreading (SMT) processor may clog sh...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
A simultaneous multithreaded (SMT) processor is able to issue and execute instructions from several ...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group o...
Abstract. Threads experiencing long-latency loads on a simultaneous multith-reading (SMT) processor ...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
Threads experiencing long-latency loads on a simultaneous multithreading (SMT) processor may clog sh...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
A simultaneous multithreaded (SMT) processor is able to issue and execute instructions from several ...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...