The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in exis...
Systolic Arrays are a common implementation method for high performance pipelined DSP structures. In...
Abstract. In this paper we present a parallel wavefront algorithm for computing an alignment between...
Cette thèse introduit deux outils pour l'accès performant aux données d'un algorithme à front d'onde...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors introduced a performance model for parallel, multidimensional, wavefront calculations wi...
This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution ...
There is a growing need to accurately simulate physical systems whose evolutions depend on the trans...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
This thesis presents the results of a simulation study of the performance of a message-passing multi...
Although there exist several approaches to rapidly solving the N-body problem, and a diversity of im...
Systolic Arrays are a common implementation method for high performance pipelined DSP structures. In...
Abstract. In this paper we present a parallel wavefront algorithm for computing an alignment between...
Cette thèse introduit deux outils pour l'accès performant aux données d'un algorithme à front d'onde...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors develop a model for the parallel performance of algorithms that consist of concurrent, t...
The authors introduced a performance model for parallel, multidimensional, wavefront calculations wi...
This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution ...
There is a growing need to accurately simulate physical systems whose evolutions depend on the trans...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI c...
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectur...
This thesis presents the results of a simulation study of the performance of a message-passing multi...
Although there exist several approaches to rapidly solving the N-body problem, and a diversity of im...
Systolic Arrays are a common implementation method for high performance pipelined DSP structures. In...
Abstract. In this paper we present a parallel wavefront algorithm for computing an alignment between...
Cette thèse introduit deux outils pour l'accès performant aux données d'un algorithme à front d'onde...