Large Scale Distributed Linear Algebra With Tensor Processing Units

Lewis, Adam G. M.
Beall, Jackson
Ganahl, Martin
Hauru, Markus
Mallick, Shrestha Basu
Vidal, Guifre

Open PDF

Open link

Publication date

December 2021

DOI

10.1073/pnas.2122762119

Publisher

Proceedings of the National Academy of Sciences

Abstract

We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXU)s dominate the runtime, yielding impressive scaling, performance, and raw size: operating in float32 precision, a full 2048-core pod of third generation TPUs can multiply two matrices with linear size $N= 220= 1 048 576$ in about 2 minutes. Via curated algorithms emphasizing large, single-core matrix multiplicati...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Large Scale Distributed Linear Algebra With Tensor Processing Units

Abstract

Extracted data

Large Scale Distributed Linear Algebra With Tensor Processing Units

Abstract

Extracted data

Related items

Related items