Even sophisticated branch-prediction techniques necessarily suffer some mispredictions, and even relatively small mispredict rates hurt performance substantially in current-generation processors. In this paper, we investigate schemes for improving performance in the face of imperfect branch predictors by having the processor simultaneously execute code from both the taken and not-taken outcomes of a branch. This paper presents data regarding the limits of multipath execution, considers fetch-bandwidth needs for multipath execution, and discusses various dynamic confidence-prediction schemes that gauge the likelihood of branch mispredictions. Our evaluations consider executing along several (2–8) paths at once. Using 4 paths and a relatively...
Abstract: Branch prediction in simultaneous multithreaded processors is difficult because multiple i...
Abstract — Executing multiple threads has proved to be an effective solution to partially hide laten...
Meeting the future requirements of higher bandwidth while providing ever more complex functions, fut...
In this paper, we examined the behavior of three of the best performing branch prediction strategies...
In the present computer architecture, speculation execution is the general and effective way to hand...
This work presents a hybrid branch predictor scheme that uses a limited form of dual path execution ...
In simultaneous multithreaded architectures many separate threads are running concurrently, sharing ...
Selective Dual Path Execution (SDPE) reduces branch misprediction penalties by selectively forking a...
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simul...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
A simultaneous multithreaded (SMT) processor is able to issue and execute instructions from several ...
Abstract: Executing multiple threads has proved to be an effective solution to partially hide latenc...
Branch prediction accuracy is a very important factor for superscalar processor performance. The abi...
Conventional speculative architectures use branch prediction to evaluate the most likely execution p...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
Abstract: Branch prediction in simultaneous multithreaded processors is difficult because multiple i...
Abstract — Executing multiple threads has proved to be an effective solution to partially hide laten...
Meeting the future requirements of higher bandwidth while providing ever more complex functions, fut...
In this paper, we examined the behavior of three of the best performing branch prediction strategies...
In the present computer architecture, speculation execution is the general and effective way to hand...
This work presents a hybrid branch predictor scheme that uses a limited form of dual path execution ...
In simultaneous multithreaded architectures many separate threads are running concurrently, sharing ...
Selective Dual Path Execution (SDPE) reduces branch misprediction penalties by selectively forking a...
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simul...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
A simultaneous multithreaded (SMT) processor is able to issue and execute instructions from several ...
Abstract: Executing multiple threads has proved to be an effective solution to partially hide latenc...
Branch prediction accuracy is a very important factor for superscalar processor performance. The abi...
Conventional speculative architectures use branch prediction to evaluate the most likely execution p...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
Abstract: Branch prediction in simultaneous multithreaded processors is difficult because multiple i...
Abstract — Executing multiple threads has proved to be an effective solution to partially hide laten...
Meeting the future requirements of higher bandwidth while providing ever more complex functions, fut...