This paper presents algorithms for implementing the transportation primitive on a distributed memory parallel architecture. The transportation primitive performs many-to-many personalized communication with bounded incoming and outgoing traffic. We present a two-stage deterministic algorithm that decomposes the communication with possibly high variance in message size into two communication stages with low message size variance. If the maximum outgoing or incoming traffic at any processor is t, transportation can be done in 2t¯ time (+ lower order terms) when t O(p 2 + pø=¯) (¯ is the inverse of the data transfer rate, ø is the startup overhead). If the maximum outgoing and incoming traffic are r and c respectively, transportation can be do...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
AbstractThe goal of this paper is to present practical experiments on broadcasting algorithms on a c...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
In this paper we present several algorithms for performing all-to-many personalized communication on...
This dissertation focuses on scalable parallel algorithms for irregular communication, random data a...
In this paper, we study the various communication algorithms on the pipeline multicomputer. We show ...
) David R. Helman David A. Bader Joseph J'aJ'a y Institute for Advanced Computer Stud...
In this paper we propose a new approach to the study of the communication requirements of distribute...
A sequential computer executes one CPU instruction at a time. Over the years sequential computers ha...
. Efficient communication in networks is a prerequisite to exploit the performance of large parallel...
Parallelization of many irregular applications results in unstructured collective communication. In ...
AbstractIn this paper, we survey many of the approaches that have been proposed for solving communic...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
AbstractThe goal of this paper is to present practical experiments on broadcasting algorithms on a c...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper presents algorithms for implementing the transportation primitive on a distributed memory...
This paper presents solutions for the problem of many-to-many personalized communication, with bound...
In this paper we present several algorithms for performing all-to-many personalized communication on...
This dissertation focuses on scalable parallel algorithms for irregular communication, random data a...
In this paper, we study the various communication algorithms on the pipeline multicomputer. We show ...
) David R. Helman David A. Bader Joseph J'aJ'a y Institute for Advanced Computer Stud...
In this paper we propose a new approach to the study of the communication requirements of distribute...
A sequential computer executes one CPU instruction at a time. Over the years sequential computers ha...
. Efficient communication in networks is a prerequisite to exploit the performance of large parallel...
Parallelization of many irregular applications results in unstructured collective communication. In ...
AbstractIn this paper, we survey many of the approaches that have been proposed for solving communic...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
AbstractThe goal of this paper is to present practical experiments on broadcasting algorithms on a c...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...