Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "multi-threaded" runtime able to transparently increase the instantaneous network concurrency and to provide near saturation bandwidth, independent of the application configuration and dynamic behavior. The runtime forwards communication requests from application level tasks to multiple communication servers. Our techniques alleviate the need for spatial and temporal application level message concurrency optimizations. Experimental results show improved message throughput and bandwidth by as m...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
Concurrency is needed everywhere. In emerging platforms such as wireless sensor networks, where a la...
Although logically available, applications may not exploit enough instantaneous communication concur...
A recent trend in high performance computing shows a rising number of cores per compute node, while ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
Today's high performance systems are typically built from shared memory nodes connected by a high sp...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
Multicore chips have become the standard building blocks for all current and future massively parall...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
Nowadays clusters are one of the most used platforms in High Performance Computing and most programm...
Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for paralle...
For communication-intensive parallel applications, the maximum degree of concurrency achievable is l...
This paper demonstrates the one-sided communication used in languages like UPC can provide a signifi...
With the end of Dennard scaling, future high performance computers are expected to consist of distri...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
Concurrency is needed everywhere. In emerging platforms such as wireless sensor networks, where a la...
Although logically available, applications may not exploit enough instantaneous communication concur...
A recent trend in high performance computing shows a rising number of cores per compute node, while ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
Today's high performance systems are typically built from shared memory nodes connected by a high sp...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
Multicore chips have become the standard building blocks for all current and future massively parall...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
Nowadays clusters are one of the most used platforms in High Performance Computing and most programm...
Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for paralle...
For communication-intensive parallel applications, the maximum degree of concurrency achievable is l...
This paper demonstrates the one-sided communication used in languages like UPC can provide a signifi...
With the end of Dennard scaling, future high performance computers are expected to consist of distri...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
Concurrency is needed everywhere. In emerging platforms such as wireless sensor networks, where a la...