Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a distinct level in the memory hierarchy and employs a separate prefetching algorithm. However, different programs require different subsets of these prefetchers to maximize their performance. Turning on all available prefetchers rarely yields the best performance and, in some cases, prefetching even hurts performance. This paper studies the effect of hardware prefetching on multithreaded code and presents a machine-learning technique to predict the optimal combination of prefetchers for a given application. This technique is based on program characterization and utilizes hardware performance events in conjunction with a pruning algorithm to ob...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
Modern architectures provide hardware memory prefetching capabilities which can be configured at run...
Hardware prefetching on IBM’s latest POWER8 processor is able to improve performance of many applica...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
International audienceData prefetching is an effective way to bridge the increasing performance gap ...
[EN] Current multi-core processors implement sophisticated hardware prefetchers, that can be configu...
The widely acknowledged performance gap between processors and memory has been the subject of much r...
Hardware predictors are widely used to improve the performance of modern processors. These predictor...
In the last century great progress was achieved in developing processors with extremely high computa...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
This paper presents new analytical models of the performance be-nefits of multithreading and prefetc...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
Modern architectures provide hardware memory prefetching capabilities which can be configured at run...
Hardware prefetching on IBM’s latest POWER8 processor is able to improve performance of many applica...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
International audienceData prefetching is an effective way to bridge the increasing performance gap ...
[EN] Current multi-core processors implement sophisticated hardware prefetchers, that can be configu...
The widely acknowledged performance gap between processors and memory has been the subject of much r...
Hardware predictors are widely used to improve the performance of modern processors. These predictor...
In the last century great progress was achieved in developing processors with extremely high computa...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
This paper presents new analytical models of the performance be-nefits of multithreading and prefetc...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...