The divergence of application behavior from optimal network usage leads to performance bottlenecks induced by communication. Communication performances are known to worsen when dealing with large quantities of small messages, due to the overhead of envelopes and going through the communication stack multiple times. Prior work has attempted to mitigate this through the aggregration of small messages, but it has only studied the impact for cases where the size of the message is constant and known ahead of time. This thesis explores the applicability of this optimization to variable-sized messages and machines with a large number of cores, analyzing both the theoretical considerations involved and the performance gains achieved in practice. Th...
This work presents modulo unrolling without unrolling (modulo unrolling WU), a method for message ag...
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-d...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...
High overhead of fine-grained communication is a significant performance bottleneck for many classes...
High overhead of fine-grained communication is a significant performance bottleneck for many classes...
Abstract—Fine-grained communication in supercomputing ap-plications often limits performance through...
Governments, universities, and companies expend vast resources building the top supercomputers. The...
Supercomputers continue to expand both in size and complexity as we reach the beginning of the exasc...
Big Data applications have gained importance over the last few years. Such applications focus on the...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivi...
The current trends in high performance computing show that large machines with tens of thousands of ...
As computer networks increase in size, become more heterogeneous and span greater geographic dista...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
This work presents modulo unrolling without unrolling (modulo unrolling WU), a method for message ag...
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-d...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...
High overhead of fine-grained communication is a significant performance bottleneck for many classes...
High overhead of fine-grained communication is a significant performance bottleneck for many classes...
Abstract—Fine-grained communication in supercomputing ap-plications often limits performance through...
Governments, universities, and companies expend vast resources building the top supercomputers. The...
Supercomputers continue to expand both in size and complexity as we reach the beginning of the exasc...
Big Data applications have gained importance over the last few years. Such applications focus on the...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivi...
The current trends in high performance computing show that large machines with tens of thousands of ...
As computer networks increase in size, become more heterogeneous and span greater geographic dista...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
This work presents modulo unrolling without unrolling (modulo unrolling WU), a method for message ag...
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-d...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...