Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The most popular measure for document similarity is the Cosine measure in Vector Space Model. In this paper, we propose a new similarity measure based on optimal matching in graph theory. The proposed measure takes into account the structural information of a document by considering the word distributions over different text segments. It first calculates the similarities for different pairs of text segments in the documents and then gets the total similarity between the documents optimally through optimal matching. We set up experiments of document similarity search to test the effectiveness of the proposed measure. The experimental results and us...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is propo...
Recent advance research in data warehousing and data mining emerges various types of information sou...
Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The m...
Document similarity search is to find documents similar to a given query document and return a ranke...
Accurately measuring document similarity is important for many text applications, e.g. document simi...
Abstract Measuring pairwise document similarity is an essential operation in various text mining tas...
Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods ...
Inter-document similarity is the critical information which determines whether or not the cluster-ba...
Abstract Text similarity measurement aims to find the commonality existing among text documents, whi...
Accurate, efficient and fast processing of textual data and classification of electronic documents h...
Text similarity measurement is a fundamental issue in many textual applications such as document clu...
Most known methods for measuring the structural similarity of document structures are based on, e.g....
Document similarity has important real life applications such as finding duplicate web sites and ide...
The similarity of documents is typically computed using fairly simple similarity measures, such as m...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is propo...
Recent advance research in data warehousing and data mining emerges various types of information sou...
Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The m...
Document similarity search is to find documents similar to a given query document and return a ranke...
Accurately measuring document similarity is important for many text applications, e.g. document simi...
Abstract Measuring pairwise document similarity is an essential operation in various text mining tas...
Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods ...
Inter-document similarity is the critical information which determines whether or not the cluster-ba...
Abstract Text similarity measurement aims to find the commonality existing among text documents, whi...
Accurate, efficient and fast processing of textual data and classification of electronic documents h...
Text similarity measurement is a fundamental issue in many textual applications such as document clu...
Most known methods for measuring the structural similarity of document structures are based on, e.g....
Document similarity has important real life applications such as finding duplicate web sites and ide...
The similarity of documents is typically computed using fairly simple similarity measures, such as m...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is propo...
Recent advance research in data warehousing and data mining emerges various types of information sou...