In this paper several methods are proposed for reducing the size of a trigram language model (LM), which is often the biggest data structure in a continuous speech recognizer, without affecting its performance. The common factor shared by the different approaches is to select only a subset of the available trigrams, trying to identify those trigrams that mostly contribute to the performance of the full trigram LM. The proposed selection criteria apply to trigram contexts, both of length one or two. These criteria rely on information theory concepts, the back-off probabilities estimated by the LM, or on a measure of the phonetic/linguistic uncertainty relative to a given context. Performance of the reduced trigrams LMs are compared both in t...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
Speech recognition relies on the language model in order to decode an utterance, and in general a be...
International audienceIn contrast to conventional n-gram approaches, which are the most used languag...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
Introduction At the current state of the art, high-accuracy speech recognition with moderate to lar...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
A new language model for speech recognition inspired by linguistic analysis is presented. The model ...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams tha...
Computational approaches in language identification often result in high number of false positives a...
This paper compares different ways of estimating bigram language models and of representing them in ...
This paper compares different ways of estimating bigram language models and of representing them in ...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram a...
In this paper we present two new techniques for language modeling in speech recognition. The rst tec...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
Speech recognition relies on the language model in order to decode an utterance, and in general a be...
International audienceIn contrast to conventional n-gram approaches, which are the most used languag...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
Introduction At the current state of the art, high-accuracy speech recognition with moderate to lar...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
A new language model for speech recognition inspired by linguistic analysis is presented. The model ...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams tha...
Computational approaches in language identification often result in high number of false positives a...
This paper compares different ways of estimating bigram language models and of representing them in ...
This paper compares different ways of estimating bigram language models and of representing them in ...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram a...
In this paper we present two new techniques for language modeling in speech recognition. The rst tec...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
Speech recognition relies on the language model in order to decode an utterance, and in general a be...
International audienceIn contrast to conventional n-gram approaches, which are the most used languag...