Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores ∗

Fengguang Song
Jack Dongarra

Publication date

August 2015

Abstract

While the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the par-allel efficiency of the deployed parallel software starts to de-crease. This unscalability problem happens to both vendor-provided and open-source software and wastes CPU cycles and energy. By expecting CPUs with hundreds of cores to be imminent, we have designed a new framework to perform matrix computations for massively many cores. Our perfor-mance analysis on manycore systems shows that the unscal-ability bottleneck is related to Non-Uniform Memory Access (NUMA): memory bus contention and remote memory ac-cess latency. To overcome the bottleneck, we have designed NUMA-aware tile algorithms with the help of a dynamic...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores ∗

Abstract

Extracted data

Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores ∗

Abstract

Extracted data

Related items

Related items