Computational approaches in language identification often result in high number of false positives and low recall rates, especially if the languages involved come from the same subfamily. In this paper, we aim to determine the cause of this problem by measuring language similarity through trigrams. Religious and literary texts were used as training data. Our experiments involving language identification show that the number of common trigrams for a given language pair is inversely proportional to precision and recall rates, whereas the average word length is directly proportional to the number of true positives. Future directions include improving language modeling and providing an approach to increase precision and recall. © 2013 IEEE
Language identification of written text has been studied for several decades. Despite this fact, mos...
The classification accuracy of text-based language identification depends on several factors, includ...
We propose a method for computing the similarity of natural languages and for clustering them based ...
In this study, we present Dice\u27s coefficient on trigram profiles as metric for language similarit...
Several studies regarding excellent exact string matching algorithms can be used to identify similar...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
Several studies regarding excellent exact string matching algorithms can be used to identify similar...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
We present two new measures of syntactic distance between languages. First, we present the movement ...
This paper addresses the problems of mea-suring similarity between languages— where the term languag...
We introduce a new measure of distance between languages based on word embedding, called word embedd...
In this paper we show how a single framework for computational modeling of linguistic similarity can...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
Abstract The paper reports several studies about quantifying language similarity via phonetic alignm...
This paper will focus on automatic methods for quantifying language similarity. This is achieved by ...
Language identification of written text has been studied for several decades. Despite this fact, mos...
The classification accuracy of text-based language identification depends on several factors, includ...
We propose a method for computing the similarity of natural languages and for clustering them based ...
In this study, we present Dice\u27s coefficient on trigram profiles as metric for language similarit...
Several studies regarding excellent exact string matching algorithms can be used to identify similar...
International audienceIn a series of preparatory experiments in 4 languages on subsets of the Europa...
Several studies regarding excellent exact string matching algorithms can be used to identify similar...
In this paper several methods are proposed for reducing the size of a trigram language model (LM), w...
We present two new measures of syntactic distance between languages. First, we present the movement ...
This paper addresses the problems of mea-suring similarity between languages— where the term languag...
We introduce a new measure of distance between languages based on word embedding, called word embedd...
In this paper we show how a single framework for computational modeling of linguistic similarity can...
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical ...
Abstract The paper reports several studies about quantifying language similarity via phonetic alignm...
This paper will focus on automatic methods for quantifying language similarity. This is achieved by ...
Language identification of written text has been studied for several decades. Despite this fact, mos...
The classification accuracy of text-based language identification depends on several factors, includ...
We propose a method for computing the similarity of natural languages and for clustering them based ...