Abstract—A single parallel application running on a multi-core system shows sub-linear speedup because of slow progress of one or more threads known as critical threads. Some of the reasons for the slow progress of threads are (1) load imbalance, (2) frequent cache misses and (3) effect of synchro-nization primitives. Identifying critical threads and minimizing their cache miss latencies can improve the overall execution time of a program. One way to hide and tolerate the cache misses is through hardware prefetching. Hardware prefetching is one of the most commonly used memory latency hiding techniques. Previous studies have shown the effectiveness of hardware prefetchers for multiprogrammed workloads (multiple sequential applications runni...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
In the last century great progress was achieved in developing processors with extremely high computa...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
International audienceIn multi-core systems, prefetch requests of one core interfere with the demand...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
This paper presents new analytical models of the performance be-nefits of multithreading and prefetc...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
One of the significant issues of processor architectureis to overcome memory latency. Prefetching ca...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
In the last century great progress was achieved in developing processors with extremely high computa...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
International audienceIn multi-core systems, prefetch requests of one core interfere with the demand...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
This paper presents new analytical models of the performance be-nefits of multithreading and prefetc...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
One of the significant issues of processor architectureis to overcome memory latency. Prefetching ca...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
In the last century great progress was achieved in developing processors with extremely high computa...
Processor performance has increased far faster than memories have been able to keep up with, forcing...