Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira, Jr., et al., we present a scalable approach that identifies program wait states...
Traditional implementations of conditional critical regions and monitors can lead to unproductive "b...
This paper describes a general framework and several specific techniques for cause-effect analysis: ...
Accesses to shared resources in multi-core systems raise predictability issues. The delay in accessi...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Due to the available concurrency in modern-day supercomputers, the complexity of developing efficien...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
To better understand the formation of wait states in MPI programs and to support the user in finding...
Load imbalance usually introduces wait states into the execution of parallel programs. Being able to...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize t...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract—Event traces are valuable for understanding the behavior of parallel programs. However, aut...
Traditional implementations of conditional critical regions and monitors can lead to unproductive "b...
This paper describes a general framework and several specific techniques for cause-effect analysis: ...
Accesses to shared resources in multi-core systems raise predictability issues. The delay in accessi...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Due to the available concurrency in modern-day supercomputers, the complexity of developing efficien...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
To better understand the formation of wait states in MPI programs and to support the user in finding...
Load imbalance usually introduces wait states into the execution of parallel programs. Being able to...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize t...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract—Event traces are valuable for understanding the behavior of parallel programs. However, aut...
Traditional implementations of conditional critical regions and monitors can lead to unproductive "b...
This paper describes a general framework and several specific techniques for cause-effect analysis: ...
Accesses to shared resources in multi-core systems raise predictability issues. The delay in accessi...