A NOVEL APPROACH TO DETECT THE NEAR-DUPLICATES BY REFINING PROVENANCE MATRIX

Tanvi Gupta
Asst. Prof Latha Banda

Publication date

December 2014

Abstract

Abstract:- In this paper, the provenance matrix is refined to get more accuracy and efficiency in detecting near-duplicates by adding two more factors ‘How ’ and ‘Why ’ , as the performance of the web search depends on the search results having information without duplicates or redundancy. More redundancy leads to more time consume and more storage, that’s why search engines try to avoid indexing of duplicates documents. Provenance model combines both the content-based and trust-based factors for classifying near-duplicates or original documents, as now a days, many of near-duplicates are from the distrusted websites

Extracted data

We use cookies to provide a better user experience.

Data Protection

A NOVEL APPROACH TO DETECT THE NEAR-DUPLICATES BY REFINING PROVENANCE MATRIX

Abstract

Extracted data

A NOVEL APPROACH TO DETECT THE NEAR-DUPLICATES BY REFINING PROVENANCE MATRIX

Abstract

Extracted data

Related items

Related items