Accerlerating linpack with CUDA on heterogenous clusters

Massimiliano Fatica

Publication date

December 2014

Abstract

This paper describes the use of CUDA to accelerate the Linpack benchmark on heterogeneous clusters, where both CPUs and GPUs are used in synergy with minor or no mod-ifications to the original source code. A host library inter-cepts the calls to DGEMM and DTRSM and executes them simultaneously on both GPUs and CPU cores. An 8U clus-ter is able to sustain more than a Teraflop using a CUDA accelerated version of HPL. 1

Extracted data

We use cookies to provide a better user experience.

Data Protection

Accerlerating linpack with CUDA on heterogenous clusters

Abstract

Extracted data

Accerlerating linpack with CUDA on heterogenous clusters

Abstract

Extracted data

Related items

Related items