Ever-increasing core counts create the need to develop parallel algorithms that avoid closely- coupled execution across all cores. In this paper we present performance analysis of several parallel asynchronous implementations of Jacobi's method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways
textabstractIn this paper we present an asynchronous branch and bound algorithm for execution on an ...
Práce se zabývá třemi algoritmy hledajícími řešení lineárních soustav rovnic. Ty zvládnou vyřešit om...
We study parallelization of direct methods on shared and distributed memory computers using OpenMP a...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely-couple...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely couple...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely-couple...
AbstractIn this paper we present two efficient algorithms for the parallel solution of n × n dense l...
The directed acyclic graph (DAG) associated with a parallel al- gorithm captures the order in which ...
This paper describes a methodology and tools to analyze and optimize the performance of task-based p...
AbstractCommunication costs are an important factor in the performance of massively parallel algorit...
Asynchronous methods for solving systems of linear equations have been researched since Chazan and M...
AbstractIn a recent paper B. Vemmer and the authors investigated the effect of varying the number of...
Elsner L, Neumann M. Monotonic sequences and rates of convergence of asynchronized iterative methods...
In this paper we give a classification of parallel branch and bound algorithms and develop a class o...
The directed acyclic graph (DAG) associated with a parallel al-gorithm captures the order in which s...
textabstractIn this paper we present an asynchronous branch and bound algorithm for execution on an ...
Práce se zabývá třemi algoritmy hledajícími řešení lineárních soustav rovnic. Ty zvládnou vyřešit om...
We study parallelization of direct methods on shared and distributed memory computers using OpenMP a...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely-couple...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely couple...
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely-couple...
AbstractIn this paper we present two efficient algorithms for the parallel solution of n × n dense l...
The directed acyclic graph (DAG) associated with a parallel al- gorithm captures the order in which ...
This paper describes a methodology and tools to analyze and optimize the performance of task-based p...
AbstractCommunication costs are an important factor in the performance of massively parallel algorit...
Asynchronous methods for solving systems of linear equations have been researched since Chazan and M...
AbstractIn a recent paper B. Vemmer and the authors investigated the effect of varying the number of...
Elsner L, Neumann M. Monotonic sequences and rates of convergence of asynchronized iterative methods...
In this paper we give a classification of parallel branch and bound algorithms and develop a class o...
The directed acyclic graph (DAG) associated with a parallel al-gorithm captures the order in which s...
textabstractIn this paper we present an asynchronous branch and bound algorithm for execution on an ...
Práce se zabývá třemi algoritmy hledajícími řešení lineárních soustav rovnic. Ty zvládnou vyřešit om...
We study parallelization of direct methods on shared and distributed memory computers using OpenMP a...