This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cycle stack quantifies where the cycles have gone, and provides hints towards optimization opportunities. We make the case that this is particularly interesting for analyzing parallel performance: understanding how cycle components scale with increasing core counts and/or input data set sizes leads to insight with respect to scaling bottlenecks due to synchronization, load imbalance, poor memory performance, etc. We present several case studies illustrating the use of cycle stacks. As a subsequent step, we further extend the methodology to analyze sets of parallel workloads using statistical data analysis, and perform a workload characterization...
The shift towards multicore processing has led to a much wider population of developers being faced ...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Over the past 10 years we have seen the transition from single core computer to multicore computing,...
This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cyc...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
International audience—Estimating the potential performance of parallel applications on the yet-to-b...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Systems for high performance computing are getting increasingly complex. On the one hand, the number...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, ...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
Traditional performance debugging and tuning of parallel programs is based on the "measure-modify" a...
Although there are many situations in which a model of application performance is valuable, performa...
The Parsec benchmark suite is widely used in evaluation of parallel architectures, both existing and...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
The shift towards multicore processing has led to a much wider population of developers being faced ...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Over the past 10 years we have seen the transition from single core computer to multicore computing,...
This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cyc...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
International audience—Estimating the potential performance of parallel applications on the yet-to-b...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Systems for high performance computing are getting increasingly complex. On the one hand, the number...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, ...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
Traditional performance debugging and tuning of parallel programs is based on the "measure-modify" a...
Although there are many situations in which a model of application performance is valuable, performa...
The Parsec benchmark suite is widely used in evaluation of parallel architectures, both existing and...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
The shift towards multicore processing has led to a much wider population of developers being faced ...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Over the past 10 years we have seen the transition from single core computer to multicore computing,...