Abstract. A sophisticated approach for the parallel execution of irreg-ular applications on parallel shared memory machines is the decompo-sition into fine-grained tasks. These tasks can be executed using a task pool which handles the scheduling of the tasks independently of the ap-plication. In this paper we present a transparent way to profile irregular applications using task pools without modifying the source code of the application. We show that it is possible to identify critical tasks which prevent scalability and to locate bottlenecks inside the application. We show that the profiling information can be used to determine a coarse estimation of the execution time for a given number of processors.
This dissertation presents two new developments in the area of computer program preparation for para...
Within the last decade, microprocessor development reached a point at which higher clock rates and m...
In this article we present a building block technique and a toolkit towards automatic discovery of w...
The popularity of parallel systems for building high performance software only continues to rise. Pr...
International audienceTo efficiently exploit the resources of new many-core architectures, integrati...
This paper presents scalability as a basis for profiling and performance debugging of parallel progr...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract—Nowadays, a challenge faced by many developers is the profiling of parallel applications so...
Workload consolidation is a common method to increase resource utilization of the clusters or data c...
Abstract. When computer architects re-invented parallelism through multi-core processors, applicatio...
International audienceNowadays, a challenge faced by many developers is the profiling of parallel ap...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Recent trends show a steady increase in the utilization of heterogeneous multicore architectures in ...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
This dissertation presents two new developments in the area of computer program preparation for para...
Within the last decade, microprocessor development reached a point at which higher clock rates and m...
In this article we present a building block technique and a toolkit towards automatic discovery of w...
The popularity of parallel systems for building high performance software only continues to rise. Pr...
International audienceTo efficiently exploit the resources of new many-core architectures, integrati...
This paper presents scalability as a basis for profiling and performance debugging of parallel progr...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Abstract—Nowadays, a challenge faced by many developers is the profiling of parallel applications so...
Workload consolidation is a common method to increase resource utilization of the clusters or data c...
Abstract. When computer architects re-invented parallelism through multi-core processors, applicatio...
International audienceNowadays, a challenge faced by many developers is the profiling of parallel ap...
Abstract—Applications must scale well to make efficient use of today’s class of petascale computers,...
Recent trends show a steady increase in the utilization of heterogeneous multicore architectures in ...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
This dissertation presents two new developments in the area of computer program preparation for para...
Within the last decade, microprocessor development reached a point at which higher clock rates and m...
In this article we present a building block technique and a toolkit towards automatic discovery of w...