The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the execution time of parallel programs. In this paper we describe a BSP programming library that has been developed, and contrast two approaches to analysing performance: (1) a pencil and paper method with a theoretical cost model; (2) a profiling tool that analyses trace information generated during program execution. These approaches are evaluated on an industrial application code that solves fluid dynamics equations around a complex aircraft geometry on an IBM SP2 and SGI PowerChallenge. We show how the tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any para...
A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs ...
Accurate prediction of fluid flows remains an important field of research and engineering. To this e...
A new set of benchmarks was developed for the performance evaluation of highly parallel supercompute...
We report on practical experience using the Oxford BSP Library to parallelize a large electromagneti...
Load balance is one of the critical factors affecting the overall per- formance of the BSP (Bulk Syn...
The results presented in this report treats the performance of the the AVBP (ver.2.0) CFDcode on the...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
We study the computational, communication, and scalability characteristics of a Computational Fluid ...
Parallelization of high performance computing applications has been a field of active research for q...
A bstr act L inux PC Clusters are a cost effective platform for parallel computational dynamics (CFD...
The development of Grid environments over recent years now allows scientists access to a range of sh...
This paper continues the work initiated by the authors on the feasibility of using ParaView as visua...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
https://doi.org/10.21949/14040091996PDFResearch PaperReport NAS-96-004Computer algorithmsComputer ar...
In this thesis, we investigate the performance and scalability of two CFD proxy applications, based ...
A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs ...
Accurate prediction of fluid flows remains an important field of research and engineering. To this e...
A new set of benchmarks was developed for the performance evaluation of highly parallel supercompute...
We report on practical experience using the Oxford BSP Library to parallelize a large electromagneti...
Load balance is one of the critical factors affecting the overall per- formance of the BSP (Bulk Syn...
The results presented in this report treats the performance of the the AVBP (ver.2.0) CFDcode on the...
This thesis documents the analysis and optimization of a high-order finite difference computational ...
We study the computational, communication, and scalability characteristics of a Computational Fluid ...
Parallelization of high performance computing applications has been a field of active research for q...
A bstr act L inux PC Clusters are a cost effective platform for parallel computational dynamics (CFD...
The development of Grid environments over recent years now allows scientists access to a range of sh...
This paper continues the work initiated by the authors on the feasibility of using ParaView as visua...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
https://doi.org/10.21949/14040091996PDFResearch PaperReport NAS-96-004Computer algorithmsComputer ar...
In this thesis, we investigate the performance and scalability of two CFD proxy applications, based ...
A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs ...
Accurate prediction of fluid flows remains an important field of research and engineering. To this e...
A new set of benchmarks was developed for the performance evaluation of highly parallel supercompute...