When a trigram backoff language model is created from a large body of text, trigrams and bigrams that occur few times in the training text are often excluded from the model in order to decrease the model size. Generally, the elimination of n-grams with very low counts is believed to not significantly affect model performance. This project investigates the degradation of a trigram backoff model’s perplexity and word error rates as bigram and trigram cutoffs are increased. The advantage of reduction in model size is compared to the increase in word error rate and perplexity scores. More importantly, this project also investigates alternative ways of excluding bigrams and trigrams from a backoff language model, using criteria other than the n...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
International audienceIn this paper, we describe how to decide a n-gram is actually impossible in a ...
Previous attempts to automatically determine multi-words as the basic unit for language modeling hav...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams tha...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represen...
This paper describes a novel approach of compressing large trigram language models, which uses scala...
A language model combining word-based and category-based ngrams within a backoff framework is presen...
In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory o...
Contains fulltext : 159825.pdf (publisher's version ) (Open Access)In this paper w...
The recent availability of large corpora for training N-gram language models has shown the utility o...
Data sparsity is a large problem in natural language processing that refers to the fact that languag...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
International audienceIn this paper, we describe how to decide a n-gram is actually impossible in a ...
Previous attempts to automatically determine multi-words as the basic unit for language modeling hav...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams tha...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represen...
This paper describes a novel approach of compressing large trigram language models, which uses scala...
A language model combining word-based and category-based ngrams within a backoff framework is presen...
In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory o...
Contains fulltext : 159825.pdf (publisher's version ) (Open Access)In this paper w...
The recent availability of large corpora for training N-gram language models has shown the utility o...
Data sparsity is a large problem in natural language processing that refers to the fact that languag...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
International audienceIn this paper, we describe how to decide a n-gram is actually impossible in a ...
Previous attempts to automatically determine multi-words as the basic unit for language modeling hav...