Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on database applications running decision support workloads. We characterize the locality patterns of ia database kernel and find frequently executed paths. Using this information, we propose an algorithm to lay out the basic blocks for improved I-fetch. Our results show a miss reduction of 60-98% for realistic I-cache sizes and a doubling of the number of instructions executed between taken branches. As a consequence, we increase the fetch bandwith provided by an aggressive sequential fetch unit from 5.8 for the original code to 10.6 using our proposed layout. Our software scheme combi...
Recent studies highlight that traditional transaction pro-cessing systems utilize the micro-architec...
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (...
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wid...
This paper examines the behavior of current and next generation microprocessors’ fetch engines while...
As more and more query processing work can be done in main memory, memory access is becoming a signi...
The design of higher performance processors has been following two major trends: increasing the pipe...
Recent supers calar processors issue four tnstructzons per cycle. These processors are also powered ...
The design of higher performance processors has been following two major trends: increasing the pipe...
Fetch performance is a very important factor because it effectively limits the overall processor per...
The effective performance of wide-issue superscalar processors depends on many parameters, such as b...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
To maximize the performance of a wide-issue superscalar processor, the fetch mechanism must be capab...
Commercial applications such as databases and Web servers constitute the most important market segme...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
Recent studies highlight that traditional transaction pro-cessing systems utilize the micro-architec...
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (...
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wid...
This paper examines the behavior of current and next generation microprocessors’ fetch engines while...
As more and more query processing work can be done in main memory, memory access is becoming a signi...
The design of higher performance processors has been following two major trends: increasing the pipe...
Recent supers calar processors issue four tnstructzons per cycle. These processors are also powered ...
The design of higher performance processors has been following two major trends: increasing the pipe...
Fetch performance is a very important factor because it effectively limits the overall processor per...
The effective performance of wide-issue superscalar processors depends on many parameters, such as b...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
To maximize the performance of a wide-issue superscalar processor, the fetch mechanism must be capab...
Commercial applications such as databases and Web servers constitute the most important market segme...
Fetch engine performance is seriously limited by the branch prediction table access latency. This fa...
Recent studies highlight that traditional transaction pro-cessing systems utilize the micro-architec...
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (...