e of Duplicates

Kyle Williams
C. Lee Giles

Publication date

January 2016

Abstract

Figure: The recursive search strategy uses search result to find more results and then combines them. SimSeer [2] is a similarity search engine that incorporates the recursive similarity search algorithm. It currently supports three similarity functions, though it is possible to add more. They are: •Key phrase similarity •Shingle similarity [3] based on sequences of words in a document. •Simhash similarity [4] based on all of the words in a document. The need to find similar documents arises in many situations: •Plagiarism detection •Near duplicate detection •Research paper recommendation Traditionally, queries are constructed by users and submitted to search engines; however, it may not be obvious to users how they should construct such q...

Extracted data

We use cookies to provide a better user experience.

Data Protection

e of Duplicates

Abstract

Extracted data

e of Duplicates

Abstract

Extracted data

Related items

Related items