This document outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes, (e.g., mesh and mesh-spectral) using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal compu...
High-performance computing is essential for solving large problems and for reducing the time to solu...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
Many parallel applications suffer from latent performance limitations that may prevent them from sca...
This document outlines a simple method for benchmarking a parallel communication library and for usi...
A parallel programming archetype [Cha94, CMMM95] is an abstraction that captures the common features...
In this paper, we describe a model for determining the optimal data and computation decomposition fo...
International audienceIn this paper, the problem of evaluating the performance of parallel programs ...
Parametric micro-level (PM) performance models are introduced to address the important issue of how ...
Although parallel computers have existed for many years, recently there has been a surge of academic...
This paper discusses the development of a portable suite of benchmarking programs for parallel comp...
Achieving a significant fraction of peak performance on a modern high-performance computer is a chal...
We propose a model for describing the parallel performance of multigrid software on distributed mem...
Scientific programmers must optimize the total time-to-solution, the combination of software develop...
Abstract – Characterizing the dynamic behavior of parallel programs in terms of their execution prof...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
High-performance computing is essential for solving large problems and for reducing the time to solu...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
Many parallel applications suffer from latent performance limitations that may prevent them from sca...
This document outlines a simple method for benchmarking a parallel communication library and for usi...
A parallel programming archetype [Cha94, CMMM95] is an abstraction that captures the common features...
In this paper, we describe a model for determining the optimal data and computation decomposition fo...
International audienceIn this paper, the problem of evaluating the performance of parallel programs ...
Parametric micro-level (PM) performance models are introduced to address the important issue of how ...
Although parallel computers have existed for many years, recently there has been a surge of academic...
This paper discusses the development of a portable suite of benchmarking programs for parallel comp...
Achieving a significant fraction of peak performance on a modern high-performance computer is a chal...
We propose a model for describing the parallel performance of multigrid software on distributed mem...
Scientific programmers must optimize the total time-to-solution, the combination of software develop...
Abstract – Characterizing the dynamic behavior of parallel programs in terms of their execution prof...
Performance analysis tools are essential to the maintenance of efficient parallel execution of scien...
High-performance computing is essential for solving large problems and for reducing the time to solu...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
Many parallel applications suffer from latent performance limitations that may prevent them from sca...