The aim of this thesis is the study of different methods to minimize the communication overhead due to the parallelization of numerical kernels. The first method consists in optimizing collective communications algorithms. We have proposed novel algorithms to achieve matrix transpose, for squared matrices distributed in a block fashion. We have also studied the total exchange problem. This communication scheme is useful in the parallelization of numerical kernels (as, for instance, the conjugate gradient algorithm). We have proposed efficient algorithms of total echange for torus topologies. The second method consists in overlapping communications by computations. We have studied some basic algorithmic principles which allow the overlap of ...
This thesis is concerned with the problem of minimizing the interprocessor data communication in par...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
Performances of programs on distributed memory parallel machines are highly dependent of the efficie...
The aim of this thesis is the study of different methods to minimize the communication overhead due ...
The aim of this thesis is the study of the most useful communication schemes, specially thebroadcast...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Interprocessor communication is an important aspect of parallel processing. Studies have shown that ...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
In this paper we propose a new approach to the study of the communication requirements of distribute...
In this paper we propose a new approach to the study of the communication requirements of distribute...
. In this paper, we present a method for overlapping communications on parallel computers for pipeli...
Distributed memory machines consisting of multiple autonomous processors connected by a network are ...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This thesis is concerned with the design of distributed algorithms for solving optimization problems...
In distributed memory parallel architectures, interprocess communication is one of the main effi.cie...
This thesis is concerned with the problem of minimizing the interprocessor data communication in par...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
Performances of programs on distributed memory parallel machines are highly dependent of the efficie...
The aim of this thesis is the study of different methods to minimize the communication overhead due ...
The aim of this thesis is the study of the most useful communication schemes, specially thebroadcast...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Interprocessor communication is an important aspect of parallel processing. Studies have shown that ...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
In this paper we propose a new approach to the study of the communication requirements of distribute...
In this paper we propose a new approach to the study of the communication requirements of distribute...
. In this paper, we present a method for overlapping communications on parallel computers for pipeli...
Distributed memory machines consisting of multiple autonomous processors connected by a network are ...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This thesis is concerned with the design of distributed algorithms for solving optimization problems...
In distributed memory parallel architectures, interprocess communication is one of the main effi.cie...
This thesis is concerned with the problem of minimizing the interprocessor data communication in par...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
Performances of programs on distributed memory parallel machines are highly dependent of the efficie...