Abstract As the scale of supercomputers grows, so does the size of the in-terconnect network. Topology-aware task mapping, which maps parallel ap-plication processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the map-ping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic re-cursive tree mapping algorithm, which can handle both inter-node mapping and intra-no...
This paper presents a generic technique for mapping parallel algorithms onto parallel architectures....
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...
Obtaining the best performance from a parallel program involves four important steps: 1. Choice of t...
International audienceConsidering the large number of processors and the size of the interconnection...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Abstract—Cosmology simulations are highly communication-intensive, thus it is critical to exploit to...
To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped ...
To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped ...
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a fr...
The task-to-processor mapping problem is addressed in the context of a local-memory multiprocessor w...
Topology aware mapping has started to attain interest again by the development of supercomputers who...
Abstract—We present a new method for mapping applica-tions ’ MPI tasks to cores of a parallel comput...
The dragonfly network topology has recently gained traction in the design of high performance comput...
Abstract. Static mapping is the assignment of parallel processes to the processing elements (PEs) of...
This paper presents a generic technique for mapping parallel algorithms onto parallel architectures....
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...
Obtaining the best performance from a parallel program involves four important steps: 1. Choice of t...
International audienceConsidering the large number of processors and the size of the interconnection...
Abhinav Bhatele, Ph.D. student at the Parallel Programming Lab at the University of Illinois present...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Abstract—Cosmology simulations are highly communication-intensive, thus it is critical to exploit to...
To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped ...
To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped ...
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a fr...
The task-to-processor mapping problem is addressed in the context of a local-memory multiprocessor w...
Topology aware mapping has started to attain interest again by the development of supercomputers who...
Abstract—We present a new method for mapping applica-tions ’ MPI tasks to cores of a parallel comput...
The dragonfly network topology has recently gained traction in the design of high performance comput...
Abstract. Static mapping is the assignment of parallel processes to the processing elements (PEs) of...
This paper presents a generic technique for mapping parallel algorithms onto parallel architectures....
International audienceInterconnection networks in parallel platforms can be made of thousands of nod...
Obtaining the best performance from a parallel program involves four important steps: 1. Choice of t...