This work presents several techniques for enlarging instruction streams. We call stream to a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for a fetch engine based on streams to provide high fetch bandwidth, which leads to obtaining performance results comparable to a trace cache. The long size of streams also enables the next stream predictor to tolerate the prediction table access latency. Therefore, enlarging instruction streams will improve the behavior of a fetch engine based on streams. We provide a comprehensive analysis of dynamic instruction streams, showing that focusing on particular kinds of...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruct...
This work presents several techniques for enlarging instruction streams. We call stream to a sequenc...
The next stream predictor is an accurate branch predictor that provides stream level sequencing. Eve...
Fetch performance is a very important factor because it effectively limits the overall processor per...
The access latency of branch predictors is a well known problem of fetch engine design. Prediction o...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
The design of higher performance processors has been following two major trends: increasing the pipe...
A sequence of branch instructions in the dynamic instruction stream forms a branch sequence if at mo...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
The design of higher performance processors has been following two major trends: increasing the pipe...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruct...
This work presents several techniques for enlarging instruction streams. We call stream to a sequenc...
The next stream predictor is an accurate branch predictor that provides stream level sequencing. Eve...
Fetch performance is a very important factor because it effectively limits the overall processor per...
The access latency of branch predictors is a well known problem of fetch engine design. Prediction o...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
The design of higher performance processors has been following two major trends: increasing the pipe...
A sequence of branch instructions in the dynamic instruction stream forms a branch sequence if at mo...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
The design of higher performance processors has been following two major trends: increasing the pipe...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...