Modern supercomputers have very powerful multi-core CPUs. The programming model on these supercomputer is switching from pure MPI to MPI for inter-node communication, and shared memory and threads for intra-node communication. Consequently the bottleneck in most systems is no longer computation but communication be-tween nodes. In this paper, we present a new compositing algorithm for hybrid MPI parallelism that focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a direct send stage where nodes are arranged in groups and ex-change regions of an image, followed by a tree compositing stage and a gather stage. We compare our algorithm...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
The only proven method for performing distributed-memory parallel rendering at large scales, tens of...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
In the medical field, volume rendering provides good quality 3D visualizations but is still not enou...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
AbstractIn this paper, we present a scalable three dimensional hybrid MPI+Threads parallel Delaunay ...
International audienceComponent trees are region-based representations that encode the inclusion rel...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Communication overhead is one of the dominant factors affecting performance in high-end computing sy...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Communication remains a significant barrier to scalability on distributed-memory systems. At present...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementatio...
[[abstract]]© 2004 Institute of Information Science Academia Sinica - The binary-swap (BS) and the p...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
The only proven method for performing distributed-memory parallel rendering at large scales, tens of...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
In the medical field, volume rendering provides good quality 3D visualizations but is still not enou...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
AbstractIn this paper, we present a scalable three dimensional hybrid MPI+Threads parallel Delaunay ...
International audienceComponent trees are region-based representations that encode the inclusion rel...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Communication overhead is one of the dominant factors affecting performance in high-end computing sy...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Communication remains a significant barrier to scalability on distributed-memory systems. At present...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementatio...
[[abstract]]© 2004 Institute of Information Science Academia Sinica - The binary-swap (BS) and the p...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
The only proven method for performing distributed-memory parallel rendering at large scales, tens of...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...