The large instruction working sets of private and public cloud workloads lead to frequent instruction cache misses and costs in the millions of dollars. While prior work has identified the growing importance of this problem, to date, there has been little analysis of where the misses come from, and what the opportunities are to improve them. To address this challenge, this paper makes three contributions. First, we present the design and deployment of a new, always-on, fleet-wide monitoring system, AsmDB, that tracks front-end bottlenecks. AsmDB uses hardware support to collect bursty execution traces, fleet-wide temporal and spatial sampling, and sophisticated offline post-processing to construct full-program dynamic control-flow graphs. S...
For several decades, online transaction processing has been one of the main applications that drives...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The increasing performance gap between processors and memory will force future architectures to devo...
As more and more query processing work can be done in main memory, memory access is becoming a signi...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (...
Commercial applications such as databases and Web servers constitute the most important market segme...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Recent technology advances enabled computerized services which have proliferated leading to a tremen...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
Cache misses represent a major bottleneck in embedded systems performance. Traditionally, compilers ...
The large number of cache misses of current applications coupled with the increasing cache miss late...
The cache Miss Ratio Curve (MRC) serves a variety of purposes such as cache partitioning, applicatio...
The memory system is often the weakest link in the performance of today’s computers. Cache design ha...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
International audience<p>The growing complexity of modern computer architectures increasingly compli...
For several decades, online transaction processing has been one of the main applications that drives...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The increasing performance gap between processors and memory will force future architectures to devo...
As more and more query processing work can be done in main memory, memory access is becoming a signi...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (...
Commercial applications such as databases and Web servers constitute the most important market segme...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Recent technology advances enabled computerized services which have proliferated leading to a tremen...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
Cache misses represent a major bottleneck in embedded systems performance. Traditionally, compilers ...
The large number of cache misses of current applications coupled with the increasing cache miss late...
The cache Miss Ratio Curve (MRC) serves a variety of purposes such as cache partitioning, applicatio...
The memory system is often the weakest link in the performance of today’s computers. Cache design ha...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
International audience<p>The growing complexity of modern computer architectures increasingly compli...
For several decades, online transaction processing has been one of the main applications that drives...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The increasing performance gap between processors and memory will force future architectures to devo...