Load imbalance usually introduces wait states into the execution of parallel programs. Being able to identify and quantify wait states is therefore essential for the diagnosis and remediation of this phenomenon. An established method of detecting wait states is to generate event traces and compare relevant timestamps across process boundaries. However, large trace volumes usually prevent the analysis of longer execution periods. In this paper, we present an extremely lightweight wait-state profiler which does not rely on traces that can be used to estimate wait states in MPI codes with arbitrarily long runtimes. The profiler combines scalability with portability and low overhead
In this report we describe how to improve communication time of MPI parallel applications with the u...
This article presents a class library for detecting typical performance problems in event traces of ...
In this paper we propose an API to pause and resume task execution depending on external events. We ...
To better understand the formation of wait states in MPI programs and to support the user in finding...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
Due to the available concurrency in modern-day supercomputers, the complexity of developing efficien...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Abstract. Performance profiling of MPI programs generates overhead during execution that introduces ...
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize t...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Utilizing the parallelism offered by multicore CPUs is hard, though profiling and tracing are well-e...
In this report we describe how to improve communication time of MPI parallel applications with the u...
This article presents a class library for detecting typical performance problems in event traces of ...
In this paper we propose an API to pause and resume task execution depending on external events. We ...
To better understand the formation of wait states in MPI programs and to support the user in finding...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
Due to the available concurrency in modern-day supercomputers, the complexity of developing efficien...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Abstract. Performance profiling of MPI programs generates overhead during execution that introduces ...
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize t...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Utilizing the parallelism offered by multicore CPUs is hard, though profiling and tracing are well-e...
In this report we describe how to improve communication time of MPI parallel applications with the u...
This article presents a class library for detecting typical performance problems in event traces of ...
In this paper we propose an API to pause and resume task execution depending on external events. We ...