pre-printThe placement of tasks in a parallel application on specific nodes of a supercomputer can significantly impact performance. Traditionally, this task mapping has focused on reducing the distance between communicating tasks on the physical network. This minimizes the number of hops that point-to-point messages travel and thus reduces link sharing between messages and contention. However, for applications that use collectives over sub-communicators, this heuristic may not be optimal. Many collectives can benefit from an increase in bandwidth even at the cost of an increase in hop count, especially when sending large messages. For example, placing communicating tasks in a cube configuration rather than a plane or a line on a torus netw...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
The orchestration of communication of distributed memory parallel applications on a parallel compute...
Considering the large number of processors and the size of the interconnection networks on exascale ...
We report on a project to develop a unified approach for building a library of collective communicat...
Governments, universities, and companies expend vast resources building the top supercomputers. The...
The eÆcient implementation of collective commu-nication operations has received much attention. Ini-...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
The ecient implementation of collective communication operations has received much attention. Initia...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
127 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2005.In this thesis, we motivate t...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
Supercomputers continue to expand both in size and complexity as we reach the beginning of the exasc...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
The orchestration of communication of distributed memory parallel applications on a parallel compute...
Considering the large number of processors and the size of the interconnection networks on exascale ...
We report on a project to develop a unified approach for building a library of collective communicat...
Governments, universities, and companies expend vast resources building the top supercomputers. The...
The eÆcient implementation of collective commu-nication operations has received much attention. Ini-...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
The ecient implementation of collective communication operations has received much attention. Initia...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
127 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2005.In this thesis, we motivate t...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
Supercomputers continue to expand both in size and complexity as we reach the beginning of the exasc...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
The orchestration of communication of distributed memory parallel applications on a parallel compute...
Considering the large number of processors and the size of the interconnection networks on exascale ...