EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.This paper describes an automatic building of N-gram language models from Web texts for large vocabulary continuous speech recognition. Although a huge amount of well-formed texts are needed to train a model, collecting and organizing such text corpus for every task by hand needs a great labor. We need the language model to update frequently to cover the current topics. To deal with this problem, we propose an automatic language model creation method by collecting Web texts via keyword-based Web search engines. We can build a task-dependent language model by selecting suitable keywords for the task. A text filtering al...
International audienceThis paper reports on investigations using two techniques for language model t...
The searching of data Process end users search their data needs using query representation, by using...
In the speech recognition of highly inflecting or compounding languages, the traditional word-based ...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
For low resource languages, collecting sufficient training data to build acoustic and language model...
We describe the use of text data scraped from the web to augment language models for Automatic Speec...
International audienceThe design and construction of a language model for minority languages is a ha...
International audienceSpoken language speech recognition systems need better understanding of natura...
The design and construction of a language model for minority languages is a hard task. By minority l...
Conversational text is a highly varied, and many abbreviations and short forms exist in different la...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
Training language model made from conversational speech is difficult due to large variation of the w...
This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than ...
In this paper, we present an efficient query selection algorithm for the retrieval of web text data ...
One particular problem in large vocabulary continuous speech recognition for low-resourced languages...
International audienceThis paper reports on investigations using two techniques for language model t...
The searching of data Process end users search their data needs using query representation, by using...
In the speech recognition of highly inflecting or compounding languages, the traditional word-based ...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
For low resource languages, collecting sufficient training data to build acoustic and language model...
We describe the use of text data scraped from the web to augment language models for Automatic Speec...
International audienceThe design and construction of a language model for minority languages is a ha...
International audienceSpoken language speech recognition systems need better understanding of natura...
The design and construction of a language model for minority languages is a hard task. By minority l...
Conversational text is a highly varied, and many abbreviations and short forms exist in different la...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
Training language model made from conversational speech is difficult due to large variation of the w...
This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than ...
In this paper, we present an efficient query selection algorithm for the retrieval of web text data ...
One particular problem in large vocabulary continuous speech recognition for low-resourced languages...
International audienceThis paper reports on investigations using two techniques for language model t...
The searching of data Process end users search their data needs using query representation, by using...
In the speech recognition of highly inflecting or compounding languages, the traditional word-based ...