In this paper we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. We assume that each processor sends a different message (of potentially different size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a set of partial permutations. We study the effectiveness of our algorithms both from the view of static scheduling and from runtime scheduling. Index Terms: Loosely synchronous communication, node contention, non-uniform message size, personalized communications, runtime scheduling, static scheduling. 1 Introduction Load balancing and reduction of communication are two i...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
Collective communication allows efficient communication and synchronization among a collection of pr...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
In this paper we present several algorithms for decomposing all-to-many personalized communication i...
Parallelization of many irregular applications results in unstructured collective communication. In ...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
This paper describes a number of optimizations that can be used to support the efficient execution o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
Collective operations are among the most important communication operations in shared- and distribut...
In applications requiring very high throughput or which have real-time deadlines, the use of paralle...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
Collective communication allows efficient communication and synchronization among a collection of pr...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
In this paper we present several algorithms for decomposing all-to-many personalized communication i...
Parallelization of many irregular applications results in unstructured collective communication. In ...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
This paper describes a number of optimizations that can be used to support the efficient execution o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
Collective operations are among the most important communication operations in shared- and distribut...
In applications requiring very high throughput or which have real-time deadlines, the use of paralle...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
Collective communication allows efficient communication and synchronization among a collection of pr...