Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of Shim, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. Shim's fine-grain measurements reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A Shim observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters an...
Introduction Traditional benchmarks such as SPEC model a simple workload: a single address space, a...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
This work describes ongoing work for measuring the performance of an application running on a machin...
Profiling is the most popular approach to diagnosing performance problems of computer systems. Profi...
This paper describes the DIGITAL Continuous Profiling Infrastructure, a sampling-based profiling sys...
Computers perform different applications in different ways. To characterize an application performan...
Modern memory systems play a critical role in the performance of applications, but a detailed unders...
The gap between peak and delivered performance for scientific applications running on microprocessor...
The gap between peak and delivered performance for scientific applications running on microprocesso...
The many configuration options of modern applications make it difficult for users to select a perfor...
One of the major architectural design considerations for any computer system is that of the memory s...
Performance evaluation tools enable analysts to shed light on how applications behave both from a ge...
The complexity of modern software makes it difficult to ship correct programs. Errors can cost money...
We introduce the usage of hardware performance counters (HPCs) as a new method that allows very prec...
Memory contention is one of the largest sources of inter-core interference in statically partitioned...
Introduction Traditional benchmarks such as SPEC model a simple workload: a single address space, a...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
This work describes ongoing work for measuring the performance of an application running on a machin...
Profiling is the most popular approach to diagnosing performance problems of computer systems. Profi...
This paper describes the DIGITAL Continuous Profiling Infrastructure, a sampling-based profiling sys...
Computers perform different applications in different ways. To characterize an application performan...
Modern memory systems play a critical role in the performance of applications, but a detailed unders...
The gap between peak and delivered performance for scientific applications running on microprocessor...
The gap between peak and delivered performance for scientific applications running on microprocesso...
The many configuration options of modern applications make it difficult for users to select a perfor...
One of the major architectural design considerations for any computer system is that of the memory s...
Performance evaluation tools enable analysts to shed light on how applications behave both from a ge...
The complexity of modern software makes it difficult to ship correct programs. Errors can cost money...
We introduce the usage of hardware performance counters (HPCs) as a new method that allows very prec...
Memory contention is one of the largest sources of inter-core interference in statically partitioned...
Introduction Traditional benchmarks such as SPEC model a simple workload: a single address space, a...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
This work describes ongoing work for measuring the performance of an application running on a machin...