Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either legacy wmma APIs or current mma APIs. Legacy wmma APIs are more easy-to-use but can only exploit limited features and power of Tensor Cores. Specifically, wmma APIs support fewer operand shapes and can not leverage the new sparse matrix multiplication feature of the newest Ampere Tensor Cores. However, the performance of current programming interface has not been well explored. Furthermore, the computation numeric behaviors of low-precision floating points (TF32, BF16, and FP16) supported by the newest Ampere Tensor Cores are also mysterious. In t...
Thesis (Master's)--University of Washington, 2019Previous work has developed a tool, the Tensor Temp...
Tensor decomposition (TD) is an important method for extracting latent information from high-dimensi...
Tensor cores (TCs) are a type of Application-Specific Integrated Circuit (ASIC) and are a recent add...
NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, wh...
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware ...
There has been a surge in the demand for a Domain Specific Architecture due to wide ranging deep lea...
Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical...
We explore the floating-point arithmetic used by the NVIDIA Volta tensor cores, which are hardware a...
Tensor algorithms are a rapidly growing field of research with applications in many scientific domai...
AbstractWe present a computational framework for high-performance tensor contractions on GPUs. High-...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Tensors are higher-dimensional analogs of matrices, and represent a key data abstraction for many ap...
In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precisi...
Popular Machine Learning (ML) and High Performance Computing (HPC) workloads contribute to a signifi...
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing app...
Thesis (Master's)--University of Washington, 2019Previous work has developed a tool, the Tensor Temp...
Tensor decomposition (TD) is an important method for extracting latent information from high-dimensi...
Tensor cores (TCs) are a type of Application-Specific Integrated Circuit (ASIC) and are a recent add...
NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, wh...
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware ...
There has been a surge in the demand for a Domain Specific Architecture due to wide ranging deep lea...
Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical...
We explore the floating-point arithmetic used by the NVIDIA Volta tensor cores, which are hardware a...
Tensor algorithms are a rapidly growing field of research with applications in many scientific domai...
AbstractWe present a computational framework for high-performance tensor contractions on GPUs. High-...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Tensors are higher-dimensional analogs of matrices, and represent a key data abstraction for many ap...
In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precisi...
Popular Machine Learning (ML) and High Performance Computing (HPC) workloads contribute to a signifi...
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing app...
Thesis (Master's)--University of Washington, 2019Previous work has developed a tool, the Tensor Temp...
Tensor decomposition (TD) is an important method for extracting latent information from high-dimensi...
Tensor cores (TCs) are a type of Application-Specific Integrated Circuit (ASIC) and are a recent add...