This paper addresses a critical problem in deploying a spoken dialog system (SDS). One of the main bottlenecks of SDS deployment for a new domain is data sparseness in building a statistical language model. Our goal is to devise a method to efficiently build a reliable language model for a new SDS. We consider the worst yet quite common scenario where only a small amount (∼1.7K utterances) of domain specific data is available for the target domain. We present a new method that exploits external static text resources that are collected for other speech recognition tasks as well as dynamic text resources acquired from World Wide Web (WWW). We show that language models built using external resources can jointly be used with limited in–domain (...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
International audienceSpoken language speech recognition systems need better understanding of natura...
We describe the use of text data scraped from the web to augment language models for Automatic Speec...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
Generic speech recognition systems typically use language models that are trained to cope with a bro...
One particular problem in large vocabulary continuous speech recognition for low-resourced languages...
In this paper, we present an efficient query selection algorithm for the retrieval of web text data ...
We attemped to improve recognition accuracy by reduc-ing the inadequacies of the lexicon and languag...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Training language model made from conversational speech is difficult due to large variation of the w...
WOCCI 2008: The 1st Workshop on Child, Computer, and Interaction, October 23, 2008, Chania, Crete,...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, ...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
International audienceSpoken language speech recognition systems need better understanding of natura...
We describe the use of text data scraped from the web to augment language models for Automatic Speec...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
Generic speech recognition systems typically use language models that are trained to cope with a bro...
One particular problem in large vocabulary continuous speech recognition for low-resourced languages...
In this paper, we present an efficient query selection algorithm for the retrieval of web text data ...
We attemped to improve recognition accuracy by reduc-ing the inadequacies of the lexicon and languag...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Training language model made from conversational speech is difficult due to large variation of the w...
WOCCI 2008: The 1st Workshop on Child, Computer, and Interaction, October 23, 2008, Chania, Crete,...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, ...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...