This work presents several techniques for enlarging instruction streams. We call stream to a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for a fetch engine based on streams to provide high fetch bandwidth, which leads to obtaining performance results comparable to a trace cache. The long size of streams also enables the next stream predictor to tolerate the prediction table access latency. Therefore, enlarging instruction streams will improve the behavior of a fetch engine based on streams. We provide a comprehensive analysis of dynamic instruction streams, showing that focusing on particular kinds of...
The design of higher performance processors has been following two major trends: increasing the pipe...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's ins...
This work presents several techniques for enlarging instruction streams. We call stream to a sequenc...
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruct...
The next stream predictor is an accurate branch predictor that provides stream level sequencing. Eve...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
Fetch performance is a very important factor because it effectively limits the overall processor per...
A sequence of branch instructions in the dynamic instruction stream forms a branch sequence if at mo...
The design of higher performance processors has been following two major trends: increasing the pipe...
The access latency of branch predictors is a well known problem of fetch engine design. Prediction o...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
Fetch engine performance is a key topic in superscalar processors, since it limits the instructionle...
The design of higher performance processors has been following two major trends: increasing the pipe...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's ins...
This work presents several techniques for enlarging instruction streams. We call stream to a sequenc...
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruct...
The next stream predictor is an accurate branch predictor that provides stream level sequencing. Eve...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
Fetch performance is a very important factor because it effectively limits the overall processor per...
A sequence of branch instructions in the dynamic instruction stream forms a branch sequence if at mo...
The design of higher performance processors has been following two major trends: increasing the pipe...
The access latency of branch predictors is a well known problem of fetch engine design. Prediction o...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
Future processors combining out-of-order execution with aggressive speculation techniques will need ...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
Fetch engine performance is a key topic in superscalar processors, since it limits the instructionle...
The design of higher performance processors has been following two major trends: increasing the pipe...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's ins...