We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior of load and store instructions in relation to coherent events and predicting their future behavior. Although this technique is well established in the uniprocessor world, it has not been widely applied for optimizing transparent shared-memory. Typically, in this environment, prediction is based on datablock access history (address-based prediction) in the form of adaptive cache coherence protocols. The advantage of instruction-based prediction is that it requires few hardware resources in the form of small prediction structures per node to match (or exceed) the perf...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
In this paper we propose Instruction-based Prediction as a means to optimize directory-based cache c...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
It has been claimed that the execution time of a program can often be predicted more accurately on a...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
In this paper we propose Instruction-based Prediction as a means to optimize directory-based cache c...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
[[abstract]]Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) share...
It has been claimed that the execution time of a program can often be predicted more accurately on a...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...