Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

Ohmura Junichi
Miyoshi Takefumi
Irie Hidetsugu
Yoshinaga Tsutomu

Open PDF

Open link

Publication date

December 2011

DOI

10.1587/transinf.e94.d.2319

Publisher

The Institute of Electronics, Information and Comunication Engineers

ISSN

0916-8532

Abstract

In this paper, we propose an approach to obtaining en-hanced performance of the Linpack benchmark on a GPU-accelerated PCcluster connected via relatively slow inter-node connections. For one nodewith a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060GPU card, we implement a CPU-GPU parallel double-precision generalmatrix－matirx multiplication (dgemm) operation, and achieve a perfor-mance improvement of 34% compared with the GPU-only case and 64%compared with the CPU-only case. For an entire 16-node cluster, each nodeof which is the same as the above and is connected with two gigabit Ether-net links, we use a computation-communication overlap scheme with GPUacceleration for the Linpack benchmark, and achieve a performance im-pro...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

Abstract

Extracted data

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

Abstract

Extracted data

Related items

Related items