Data reduction is a fundamental operation of parallel computing. We derive lower bounds on communication latency for global data reduction and multiple global data reduction on reconfigurable tori. We present optimal global data reduction algorithms and multiple global data reduction algorithms on reconfigurable tori of any dimension. The formal reduction algorithms we give can make reduction and broadcast operations in [5][6][7] easy to implement.link_to_subscribed_fulltex
International audienceInterprocessor communication often dominates the runtime of large matrix compu...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
AbstractWe analyze the inherent complexity of implementing Lévy's notion of optimal evaluation for t...
Two and three dimensional k-tori are among the most used topologies in the design of new parallel co...
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce compu...
In the Generalized Minimal Residual Method (GMRES), the global all-to-all communication re- quired i...
All-to-all personalized communication, also known as complete exchange, is one of the most dense com...
Consider a network of processor elements arranged in a d-dimensional grid, where each processor can ...
Near-optimal gossiping algorithms are given for two- and higher dimensional tori. It is assumed that...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
Near-optimal gossiping algorithms are given for two- and higher dimensional tori. It is assumed that...
Reconfiguration is largely an unexplored property in the context of parallel models of computation. ...
High-dimensional simulations pose a challenge even for next-generation high-performance computers. H...
We discuss the problematics of efficient, general purpose parallel computation. Parallel processing ...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
International audienceInterprocessor communication often dominates the runtime of large matrix compu...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
AbstractWe analyze the inherent complexity of implementing Lévy's notion of optimal evaluation for t...
Two and three dimensional k-tori are among the most used topologies in the design of new parallel co...
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce compu...
In the Generalized Minimal Residual Method (GMRES), the global all-to-all communication re- quired i...
All-to-all personalized communication, also known as complete exchange, is one of the most dense com...
Consider a network of processor elements arranged in a d-dimensional grid, where each processor can ...
Near-optimal gossiping algorithms are given for two- and higher dimensional tori. It is assumed that...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
Near-optimal gossiping algorithms are given for two- and higher dimensional tori. It is assumed that...
Reconfiguration is largely an unexplored property in the context of parallel models of computation. ...
High-dimensional simulations pose a challenge even for next-generation high-performance computers. H...
We discuss the problematics of efficient, general purpose parallel computation. Parallel processing ...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
International audienceInterprocessor communication often dominates the runtime of large matrix compu...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
AbstractWe analyze the inherent complexity of implementing Lévy's notion of optimal evaluation for t...