Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors

Gupta, Kshitij
Owens, John D.

Publication date

January 2009

Publisher

eScholarship, University of California

Abstract

In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation reduction techniques. We find considerable locality at the frame, CI-GMM, and mixture layers of GMM compute, and show how it can be extracted by following a chunk-based technique of processing multiple frames for every load of a GMM. On a 1,000-word, command-and-control, continuous-speech task, we are able to achieve compute and memory bandwid...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors

Abstract

Extracted data

Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors

Abstract

Extracted data

Related items

Related items