Abstract: Similarities for textual data The evaluation of similarities between textual entities (documents, sentences, words...) is one of the central issues for the implementation of efficient methods for tasks such as description and exploration of textual data, information retrieval or knowledge extraction (text mining). The main purpose of this contribution is to propose a comparative presentation of different approaches used to define the notion of similarity in fields such as Textual Data Analysis, Information Retrieval or Text Mining. We first discuss some of the linguistic treatments (tagging, lemmatization, …) necessary for the pre-processing of the textual data and then analyze some of the measures (cosinus, chi-square, Kullback-...
We present a system to determine content similarity of documents. More specifi-cally, our goal is to...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...
We present a comprehensive study of computing similarity between texts. We start from the observatio...
We present a new composite similarity metric that combines information from multiple linguistic indi...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This paper presents a method for measuring the semantic similarity of texts using a corpus based mea...
The massive amount of information from the internet has revolutionized the field of natural language...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
Computing text similarity is a foundational technique for a wide range of tasks in natural language ...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
2 The concept of a Document Similarity Measure is ill-defined due to the wide variety of existing me...
Measuring document similarity has shown its fundamental utilization in various text mining applicati...
Text similarity measurement compares text with available references to indicate the degree of simila...
We present a system to determine content similarity of documents. More specifi-cally, our goal is to...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...
We present a comprehensive study of computing similarity between texts. We start from the observatio...
We present a new composite similarity metric that combines information from multiple linguistic indi...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This paper presents a method for measuring the semantic similarity of texts using a corpus based mea...
The massive amount of information from the internet has revolutionized the field of natural language...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and ...
Computing text similarity is a foundational technique for a wide range of tasks in natural language ...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
2 The concept of a Document Similarity Measure is ill-defined due to the wide variety of existing me...
Measuring document similarity has shown its fundamental utilization in various text mining applicati...
Text similarity measurement compares text with available references to indicate the degree of simila...
We present a system to determine content similarity of documents. More specifi-cally, our goal is to...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...
This paper reports experiments on a corpus of news articles from the Financial Times, comparing diff...