Multi-core multi-socket distributed shared-memory com- puters (DSM computers, for short) have become an impor- tant node architecture in scientific computing as they provide substantial computational capacity with relatively low space and power requirements. Compared to conventional computer networks, inter-chip networks used in DSM computers feature higher bandwidth, lower latency and tighter integration with the CPU. The inter-chip network is a shared resource among the user application and many other services, which can lead to consid- erable variation of execution times of identical communication tasks. In this work, we explore traffic patterns resulting from MPI collective communication primitives and investigate the ques- tion whether...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Recent technological advances have produced network interfaces that provide users with very low-late...
A shared memory multiprocessor having clusters of processing elements and memory modules is proposed...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of proce...
International audienceThe task graph of telecommunication applications often exhibits massive coarse...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
As commodity components continue to dominate the realm of high-end computing, two hardware trends ha...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
The goal of this paper is to gain insight into the relative performance of communication mechanisms ...
A sequential computer executes one CPU instruction at a time. Over the years sequential computers ha...
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneou...
A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of proce...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
The performance evaluation of multiprocessor interconnects cannot be divorced from issues of traffic...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Recent technological advances have produced network interfaces that provide users with very low-late...
A shared memory multiprocessor having clusters of processing elements and memory modules is proposed...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of proce...
International audienceThe task graph of telecommunication applications often exhibits massive coarse...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
As commodity components continue to dominate the realm of high-end computing, two hardware trends ha...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
The goal of this paper is to gain insight into the relative performance of communication mechanisms ...
A sequential computer executes one CPU instruction at a time. Over the years sequential computers ha...
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneou...
A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of proce...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
The performance evaluation of multiprocessor interconnects cannot be divorced from issues of traffic...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Recent technological advances have produced network interfaces that provide users with very low-late...
A shared memory multiprocessor having clusters of processing elements and memory modules is proposed...