Token merging in language model-based confusible disambiguation

Stehouwer, H.
Van Zaanen, M.

Publication date

January 2009

Language

English

Abstract

In the context of confusible disambiguation (spelling correction that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this classiﬁcation technique cannot be applied directly, because the computation of the probabilities is skewed. Previous work already showed that probabilities based on different order n-grams should not be compared directly. In this article, we propose new probability metrics in which the size of the n is varied according to the number of tokens of the confusible alternative. This requires access to n-grams of variable length. Results show that the synchronous back-off method is extremely...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Token merging in language model-based confusible disambiguation

Abstract

Extracted data

Token merging in language model-based confusible disambiguation

Abstract

Extracted data

Topics

Related items

Topics

Related items