Load balance is one of the critical factors affecting the overall per- formance of the BSP (Bulk Synchronous Parallel) programs. Without sufficient performance profiling information generated by effective profiling tools, it is often difficult to find out what extent and where load imbalance has occurred in a BSP program. In this paper, we introduce a new parallel performance profil- ing system for the BSP model. The system traces and generates comprehensive information on timing and communication by each process in each superstep. Its aim is to assist in the improvement of BSP program performance by identi- fying load imbalance among processors. The profiling data is visualised via a series of performance profiling graphs, making it easier...
Achieving a significant fraction of peak performance on a modern high-performance computer is a chal...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs ...
High-performance computing is essential for solving large problems and for reducing the time to solu...
The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the...
We present an analytical model that extends BSP to cover both oblivious synchronization and group pa...
High-performance computing is essential for solving large problems and for reducing the time to solu...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
With rising complexity of high performance computing systems and their parallel software, performanc...
Over the past 10 years we have seen the transition from single core computer to multicore computing,...
As the complexity of parallel computers grows, constraints posed by the construction of larger syste...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
Although there are many situations in which a model of application performance is valuable, performa...
Achieving a significant fraction of peak performance on a modern high-performance computer is a chal...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs ...
High-performance computing is essential for solving large problems and for reducing the time to solu...
The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the...
We present an analytical model that extends BSP to cover both oblivious synchronization and group pa...
High-performance computing is essential for solving large problems and for reducing the time to solu...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
With rising complexity of high performance computing systems and their parallel software, performanc...
Over the past 10 years we have seen the transition from single core computer to multicore computing,...
As the complexity of parallel computers grows, constraints posed by the construction of larger syste...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
Although there are many situations in which a model of application performance is valuable, performa...
Achieving a significant fraction of peak performance on a modern high-performance computer is a chal...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...