Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing. We propose an acoustic data-driven subword modeling (ADSM) approach that adapts the advantages of several text-based and acoustic-based subword methods into one pipeline. With a fully acoustic-oriented label design and learning process, ADSM produces acoustic-structured subword units and acoustic-matched target sequence for further ASR training. The obtained ADSM labels are evaluated with different end-to-end ASR approaches including CTC, RNN-Transducer and attention models. Experiments on the LibriSpeech corpus show that ADSM clearly outperforms both byte pair encoding (BPE) and p...
Thesis (Master's)--University of Washington, 2016-12Given the vast amount of textual data that we ha...
Because in agglutinative languages the number of observed word forms is very high, subword units are...
Current automatic speech recognition (ASR) research is focused on recognition of continuous, sponta...
In today's society, speech recognition systems have reached a mass audience, especially in the field...
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fix...
Exploiting effective target modeling units is very important and has always been a concern in end-to...
Modern language models mostly take sub-words as input, a design that balances the trade-off between ...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
The vast majority of ASR research uses corpora in which both the training and test data have been pr...
For a language with no transcribed speech available (the zero-resource scenario), conventional acous...
We experiment with subword segmentation approaches that are widely used to address the open vocabula...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
Over the past several years, I have been conducting research on subword modeling in speech recogniti...
Spiess T, Wrede B, Kummert F, Fink GA. Data-driven Pronunciation Modeling for ASR using Acoustic Sub...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Thesis (Master's)--University of Washington, 2016-12Given the vast amount of textual data that we ha...
Because in agglutinative languages the number of observed word forms is very high, subword units are...
Current automatic speech recognition (ASR) research is focused on recognition of continuous, sponta...
In today's society, speech recognition systems have reached a mass audience, especially in the field...
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fix...
Exploiting effective target modeling units is very important and has always been a concern in end-to...
Modern language models mostly take sub-words as input, a design that balances the trade-off between ...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
The vast majority of ASR research uses corpora in which both the training and test data have been pr...
For a language with no transcribed speech available (the zero-resource scenario), conventional acous...
We experiment with subword segmentation approaches that are widely used to address the open vocabula...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
Over the past several years, I have been conducting research on subword modeling in speech recogniti...
Spiess T, Wrede B, Kummert F, Fink GA. Data-driven Pronunciation Modeling for ASR using Acoustic Sub...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Thesis (Master's)--University of Washington, 2016-12Given the vast amount of textual data that we ha...
Because in agglutinative languages the number of observed word forms is very high, subword units are...
Current automatic speech recognition (ASR) research is focused on recognition of continuous, sponta...