International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europarl corpus, we show that a large number of unseen trigrams can be reconstructed by proportional analogy with trigrams having the lowest frequencies. We derive a very simple smoothing scheme from this empirical result and show that it outperforms Good-Turing and Kneser-Ney smoothing schemes on trigrams models in all 11 languages on the common multilingual part of the Europarl corpus, except Finnish
Contains fulltext : 159825.pdf (publisher's version ) (Open Access)In this paper w...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams th...
International audienceThis paper describes an extension of the n-gram language model: the similar n-...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
Computational approaches in language identification often result in high number of false positives a...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
International audienceThis paper deals with the combination of a trigram and a triclass. This combin...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram a...
We introduce a novel approach for building language models based on a systematic, recursive explorat...
We present a tutorial introduction to n-gram models for language modeling and survey the most widely...
In this study, we present Dice\u27s coefficient on trigram profiles as metric for language similarit...
this paper appears in Proceedings of the Third International Workshop on Parsing Technologies, 1993
Contains fulltext : 159825.pdf (publisher's version ) (Open Access)In this paper w...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams th...
International audienceThis paper describes an extension of the n-gram language model: the similar n-...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
Computational approaches in language identification often result in high number of false positives a...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
International audienceThis paper deals with the combination of a trigram and a triclass. This combin...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
ICSLP1998: the 5th International Conference on Spoken Language Processing, November 30 - December 4...
In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram a...
We introduce a novel approach for building language models based on a systematic, recursive explorat...
We present a tutorial introduction to n-gram models for language modeling and survey the most widely...
In this study, we present Dice\u27s coefficient on trigram profiles as metric for language similarit...
this paper appears in Proceedings of the Third International Workshop on Parsing Technologies, 1993
Contains fulltext : 159825.pdf (publisher's version ) (Open Access)In this paper w...
When a trigram backoff language model is created from a large body of text, trigrams and bigrams th...
International audienceThis paper describes an extension of the n-gram language model: the similar n-...