Abstract. We are presenting an approach to calculating the semantic similarity of documents written in the same or in different languages. The similarity cal-culation is achieved by representing the document contents in a language-inde-pendent way, using the descriptor terms of the multilingual thesaurus EUROVOC, and by then calculating the distance between these representations. While EUROVOC is a carefully handcrafted knowledge structure, our procedure uses sta-tistical techniques. The method was applied to a collection of 5990 English and Spanish parallel texts and evaluated by measuring the number of times the translation of a given document was identified as the most similar document. The good results showed the feasibility and usefuln...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
In this paper we present a metric that measures comparability of documents across different language...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
National audienceThis paper compares the performance of one thesaurus-based approach against three l...
Texts and their translations are a rich linguistic resource that can be used to train and test stati...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
Bilingual or even polylingual word embeddings created many possibilities for tasks involving multipl...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
Despite being one of the most popular tasks in lexical semantics, word similarity has often been lim...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. En...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
In this paper we present a metric that measures comparability of documents across different language...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
National audienceThis paper compares the performance of one thesaurus-based approach against three l...
Texts and their translations are a rich linguistic resource that can be used to train and test stati...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
Bilingual or even polylingual word embeddings created many possibilities for tasks involving multipl...
This work addresses the issue of cross-language high similarity and near-duplicates search, where, f...
Despite being one of the most popular tasks in lexical semantics, word similarity has often been lim...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. En...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
In this paper we present a metric that measures comparability of documents across different language...