The authors introduced a performance model for parallel, multidimensional, wavefront calculations with machine performance characterized using the LogGP framework. The model accounts for overlap in the communication and computation components. The agreement with experimental data is very good under a variety of model sizes, data partitionings, blocking strategies, and on three different parallel architectures. Using the model, the authors analyzed performance of a deterministic transport code on a hypothetical 100 Tflops future parallel system of interest to ASCI
Accurately modeling and predicting performance for large-scale applications becomes increasingly dif...
Faced with the challenge of effectively translating an ex-ponentially growing number of transistors ...
In distributed and vectorized computing there is a large number of highly different supercomputing p...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and...
Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution ...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms use...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
This paper details the development and application of a model for predictive performance analysis of...
Accurately modeling and predicting performance for large-scale applications becomes increasingly dif...
Faced with the challenge of effectively translating an ex-ponentially growing number of transistors ...
In distributed and vectorized computing there is a large number of highly different supercomputing p...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and...
Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution ...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms use...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
This paper details the development and application of a model for predictive performance analysis of...
Accurately modeling and predicting performance for large-scale applications becomes increasingly dif...
Faced with the challenge of effectively translating an ex-ponentially growing number of transistors ...
In distributed and vectorized computing there is a large number of highly different supercomputing p...