Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The most popular measure for document similarity is the Cosine measure in Vector Space Model. In this paper, we propose a new similarity measure based on optimal matching in graph theory. The proposed measure takes into account the structural information of a document by considering the word distributions over different text segments. It first calculates the similarities for different pairs of text segments in the documents and then gets the total similarity between the documents optimally through optimal matching. We set up experiments of document similarity search to test the effectiveness of the proposed measure. The experimental results and us...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
Document similarity measures are crucial components of many text-analysis tasks, including informati...
Recent advance research in data warehousing and data mining emerges various types of information sou...
Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The m...
Document similarity search is to find documents similar to a given query document and return a ranke...
Abstract Measuring pairwise document similarity is an essential operation in various text mining tas...
Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods ...
Accurately measuring document similarity is important for many text applications, e.g. document simi...
Most known methods for measuring the structural similarity of document structures are based on, e.g....
Inter-document similarity is the critical information which determines whether or not the cluster-ba...
Abstract Text similarity measurement aims to find the commonality existing among text documents, whi...
Accurate, efficient and fast processing of textual data and classification of electronic documents h...
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is propo...
The similarity of documents is typically computed using fairly simple similarity measures, such as m...
Text similarity measurement is a fundamental issue in many textual applications such as document clu...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
Document similarity measures are crucial components of many text-analysis tasks, including informati...
Recent advance research in data warehousing and data mining emerges various types of information sou...
Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The m...
Document similarity search is to find documents similar to a given query document and return a ranke...
Abstract Measuring pairwise document similarity is an essential operation in various text mining tas...
Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods ...
Accurately measuring document similarity is important for many text applications, e.g. document simi...
Most known methods for measuring the structural similarity of document structures are based on, e.g....
Inter-document similarity is the critical information which determines whether or not the cluster-ba...
Abstract Text similarity measurement aims to find the commonality existing among text documents, whi...
Accurate, efficient and fast processing of textual data and classification of electronic documents h...
A novel document similarity measure based on the Proportional Transportation Distance (PTD) is propo...
The similarity of documents is typically computed using fairly simple similarity measures, such as m...
Text similarity measurement is a fundamental issue in many textual applications such as document clu...
In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate ...
Document similarity measures are crucial components of many text-analysis tasks, including informati...
Recent advance research in data warehousing and data mining emerges various types of information sou...