Hardware data prefetcher engines have been extensively used to reduce the impact of memory latency. However, micro-processors ’ hardware prefetcher engines do not include any automatic hardware control able to dynamically tune their operation. This lacking architectural feature causes systems to operate with prefetchers in a fixed configuration, which in many cases harms performance and energy consumption. In this paper, a piece of software that solves the discussed problem in the context of the IBM POWER7 microproces-sor is presented. The proposed solution involves using the runtime software as a bridge that is able to characterize user applications ’ workload and dynamically reconfigure the pre-fetcher engine. The proposed mechanisms has ...
Performance-enhancement techniques improve CPU speed, but at higher cost to other valuable system r...
This work has been partially supported by the Spanish Ministry of Science and Innovation under grant...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Hardware prefetching on IBM’s latest POWER8 processor is able to improve performance of many applica...
International audienceData prefetching is an effective way to bridge the increasing performance gap ...
Performance-enhancement techniques improve CPU speed, but at higher cost to other valuable system re...
Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a signi...
Current microprocessors include hardware to optimize some specifics workloads. In general, these har...
Many modern workloads compute on large amounts of data, often with irregular memory accesses. Curren...
Extensive research has been done in prefetching techniques that hide memory latency in microprocesso...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
Current microprocessors include several knobs to modify the hardware behavior in order to improve pe...
Many modern workloads compute on large amounts of data, often with irregular memory accesses. Curren...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Performance-enhancement techniques improve CPU speed, but at higher cost to other valuable system r...
This work has been partially supported by the Spanish Ministry of Science and Innovation under grant...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Hardware prefetching on IBM’s latest POWER8 processor is able to improve performance of many applica...
International audienceData prefetching is an effective way to bridge the increasing performance gap ...
Performance-enhancement techniques improve CPU speed, but at higher cost to other valuable system re...
Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a signi...
Current microprocessors include hardware to optimize some specifics workloads. In general, these har...
Many modern workloads compute on large amounts of data, often with irregular memory accesses. Curren...
Extensive research has been done in prefetching techniques that hide memory latency in microprocesso...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
Current microprocessors include several knobs to modify the hardware behavior in order to improve pe...
Many modern workloads compute on large amounts of data, often with irregular memory accesses. Curren...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Performance-enhancement techniques improve CPU speed, but at higher cost to other valuable system r...
This work has been partially supported by the Spanish Ministry of Science and Innovation under grant...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...