An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method

Das, Shine N
Mathew, Midhun
Vijayaraghavan, Pramod K.

Publication date

October 2011

Publisher

Institute of Advanced Engineering and Science

Abstract

The existence of billions of web data has severely affected the performance and reliability of web search. The presence of near duplicate web pages plays an important role in this performance degradation while integrating data from heterogeneous sources. Web mining faces huge problems due to the existence of such documents. These pages increase the index storage space and thereby increase the serving cost. By introducing efficient methods to detect and remove such documents from the Web not only decreases the computation time but also increases the relevancy of search results. We aim a novel idea for finding near duplicate web pages which can be incorporated in the field of plagiarism detection, spam detection and focused web crawling scena...

Extracted data

We use cookies to provide a better user experience.

Data Protection

An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method

Abstract

Extracted data

An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method

Abstract

Extracted data

Related items

Related items