Minimizing communications when mapping affine loop nests onto distributed memory parallel computers has already drawn a lot of attention. This paper focuses on the next step : as it is generally impossible to obtain a communication-free (or local) mapping, how to optimize the residual communications? We explain how to take advantage of macro-communications such as broadcasts, scatters, gathers or reductions or how to decompose general affine communications into simpler ones that can be performed more efficiently. We finally give a two-step heuristic that summarizes our approach : first minimize the number of nonlocal communications, then optimize residual affine communications using macro-communications or decompositions.Minimiser les commu...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
In this paper we propose a new approach to the study of the communication requirements of distribute...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communication overhead when mapping affine loop nests onto distributed memory parallel co...
In this paper, we consider the communications involved by the execution of a complex application, de...
The aim of this thesis is the study of different methods to minimize the communication overhead due ...
Many parallel applications require periodic redistribution of workloads and associated data. In a di...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Many parallel applications require periodic redistribution of workloads and associated data. In a di...
Reducing communication overhead is extremely important in distributed-memory messagepassing architec...
International audienceIn distributed optimization for large-scale learning, a major performance limi...
Abstract—Many parallel applications require periodic redistribution of workloads and associated data...
Reconfiguration is largely an unexplored property in the context of parallel models of computation. ...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
In this paper we propose a new approach to the study of the communication requirements of distribute...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communication overhead when mapping affine loop nests onto distributed memory parallel co...
In this paper, we consider the communications involved by the execution of a complex application, de...
The aim of this thesis is the study of different methods to minimize the communication overhead due ...
Many parallel applications require periodic redistribution of workloads and associated data. In a di...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Many parallel applications require periodic redistribution of workloads and associated data. In a di...
Reducing communication overhead is extremely important in distributed-memory messagepassing architec...
International audienceIn distributed optimization for large-scale learning, a major performance limi...
Abstract—Many parallel applications require periodic redistribution of workloads and associated data...
Reconfiguration is largely an unexplored property in the context of parallel models of computation. ...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
In this paper we propose a new approach to the study of the communication requirements of distribute...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...