In this paper we propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior of load and store instructions in relation to coherent events and predicting their future behavior. Although this technique is well established in the uniprocessor world it has not been widely applied for optimizing transparent shared-memory where prediction ---in the form of adaptive cache coherence protocols--- is typically address-based. The advantage of this technique is that it requires very few hardware resources in the form of very small prediction tables per node. In contrast, address-based prediction typically requires storage proportional to ...
Recent research advocates using general message predictors to learn and predict the coherence activi...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA s...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
Caches are partitioned intosubarrays for optimal timing. In a set-associative cache, if the way hold...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Storing instructions in caches has led to dramatic increases in the speed at which programs can exec...
Recent research advocates using general message predictors to learn and predict the coherence activi...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA s...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
Caches are partitioned intosubarrays for optimal timing. In a set-associative cache, if the way hold...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Storing instructions in caches has led to dramatic increases in the speed at which programs can exec...
Recent research advocates using general message predictors to learn and predict the coherence activi...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...