The effect of word similarity on N-gram language models in Northern and Southern Dutch

Pelemans, Joris
Demuynck, KrisTW068020010246970000-0001-8525-7160345DF3F8-F0EE-11E1-A9DE-61C894A0A6B4
Van Damme, Hugo
Wambacq, Patrick

Publication date

January 2014

Abstract

In this paper we examine several combinations of classical N-gram language models with more advanced and well known techniques based on word similarity such as cache models and Latent Semantic Analysis. We compare the efficiency of these combined models to a model that combines N-grams with the recently proposed, state-of-the-art neural network-based continuous skip-gram. We discuss the strengths and weaknesses of each of these models, based on their predictive power of the Dutch language and find that a linear interpolation of a 3-gram, a cache model and a continuous skip-gram is capable of reducing perplexity by up to 18.63%, compared to a 3-gram baseline. This is three times the reduction achieved with a 5-gram.In addition, we investigat...

Extracted data

We use cookies to provide a better user experience.

Data Protection

The effect of word similarity on N-gram language models in Northern and Southern Dutch

Abstract

Extracted data

The effect of word similarity on N-gram language models in Northern and Southern Dutch

Abstract

Extracted data

Related items

Related items