The performance of GRAPE-DR for dense matrix operations

Makino, Junichiro
Daisaka, Hiroshi
Fukushige, Toshiyuki
Sugawara, Yutaka
Inaba, Mary
Hiraki, Kei

Open link

Publication date

December 2011

DOI

10.1016/j.procs.2011.04.094

Publisher

Published by Elsevier B.V.

Abstract

AbstractWe describe the implementation and performance of dense matrix multiplication and LU decomposition on the GRAPE-DR SIMD accelerator board. A GRAPE-DR card, with 4 GRAPE-DR chips, has the theoretical peak DP performance of 819 Gflops. Each GRAPE-DR chip has 512 processing elements and operates with 400MHz clock cycle. each PE can perform one addition and one multiplication in every two clock cycles. The measured performance of matrix multiplication is 730 Gflops for the multiplication of matrices with size 51200 by 2048 and 2048 by 51200. The performance of LU decomposition is 480 Gflops for the problem size of 51200

Extracted data

We use cookies to provide a better user experience.

Data Protection

The performance of GRAPE-DR for dense matrix operations

Abstract

Extracted data

The performance of GRAPE-DR for dense matrix operations

Abstract

Extracted data

Related items

Related items