Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus linguistics for which there is still a notable lack of standards. Bearing this in mind, this paper aims at investigating the use of textual distributional similarity measures in the context of comparable corpora. More precisely, we address the issue of measuring the relatedness between documents by extracting and measuring their common content. For this purpose, we designed and applied a methodology that exploits available natural language processing technology with statistical methods. Our findings showed that using a list of common entities and a simple, yet robust and high performance set of distributional similarity measures was enough to ...
In this paper, we consider two applications of distributional similarity measures, probability estim...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
This paper aims at investigating the use of textual distributional similarity measures in the contex...
Decisions at the outset of compiling a comparable corpus are of crucial importance for how the corpu...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This research addresses the problem of deriving semantic similarity between words of language using ...
In this paper, we present a model of statistical word-level mapping for comparable corpora. The appr...
Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptio...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
We present a comprehensive study of computing similarity between texts. We start from the observatio...
2 The concept of a Document Similarity Measure is ill-defined due to the wide variety of existing me...
In recent years a variety of approaches in computing seman-tic relatedness have been proposed. Howev...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
In this paper, we consider two applications of distributional similarity measures, probability estim...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
This paper aims at investigating the use of textual distributional similarity measures in the contex...
Decisions at the outset of compiling a comparable corpus are of crucial importance for how the corpu...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This research addresses the problem of deriving semantic similarity between words of language using ...
In this paper, we present a model of statistical word-level mapping for comparable corpora. The appr...
Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptio...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
We present a comprehensive study of computing similarity between texts. We start from the observatio...
2 The concept of a Document Similarity Measure is ill-defined due to the wide variety of existing me...
In recent years a variety of approaches in computing seman-tic relatedness have been proposed. Howev...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
In this paper, we consider two applications of distributional similarity measures, probability estim...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...