In this paper we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. We assume that each processor sends a different message (of potentially different size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a set of partial permutations. We study the effectiveness of our algorithms both from the view of static scheduling and from runtime scheduling
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, a...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
In this paper we present several algorithms for decomposing all-to-many personalized communication i...
Parallelization of many irregular applications results in unstructured collective communication. In ...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper describes a number of optimizations that can be used to support the efficient execution o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, a...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
With the advent of new routing methods, the distance to which a message is sent is becoming relative...
In this paper we present several algorithms for decomposing all-to-many personalized communication i...
Parallelization of many irregular applications results in unstructured collective communication. In ...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper describes a number of optimizations that can be used to support the efficient execution o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, a...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...