In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation reduction techniques. We find considerable locality at the frame, CI-GMM, and mixture layers of GMM compute, and show how it can be extracted by following a chunk-based technique of processing multiple frames for every load of a GMM. On a 1,000-word, command-and-control, continuous-speech task, we are able to achieve compute and memory bandwid...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low...
Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known t...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
Abstract—In the last three years, GPUs are more and more being used for general purpose applications...
<p>The continued growth of the computational capability of throughput processors has made throughput...
Enhancing the match between software executions and hardware features is key to computing efficiency...
General-purpose Graphics Processing Units (GPGPUs) have shown enormous promise in enabling high thro...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low...
Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known t...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
Abstract—In the last three years, GPUs are more and more being used for general purpose applications...
<p>The continued growth of the computational capability of throughput processors has made throughput...
Enhancing the match between software executions and hardware features is key to computing efficiency...
General-purpose Graphics Processing Units (GPGPUs) have shown enormous promise in enabling high thro...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...