In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation reduction techniques. We find considerable locality at the frame, CI-GMM, and mixture layers of GMM compute, and show how it can be extracted by following a chunk-based technique of processing multiple frames for every load of a GMM. On a 1,000-word, command-and-control, continuous-speech task, we are able to achieve compute and memory bandwid...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low...
Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known t...
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
Automatic Speech Recognition (ASR) is one of the most important applications in the area of cognitiv...
Automatic speech recognition (ASR) is a very demanding computing task. Much research has been done i...
Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile seg...
abstract: General-purpose processors propel the advances and innovations that are the subject of hum...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
The recent advent of high-throughput sequencing machines producing big amounts of short reads has bo...
Enhancing the match between software executions and hardware features is key to computing efficiency...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low...
Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known t...
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
Automatic Speech Recognition (ASR) is one of the most important applications in the area of cognitiv...
Automatic speech recognition (ASR) is a very demanding computing task. Much research has been done i...
Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile seg...
abstract: General-purpose processors propel the advances and innovations that are the subject of hum...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
The recent advent of high-throughput sequencing machines producing big amounts of short reads has bo...
Enhancing the match between software executions and hardware features is key to computing efficiency...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...