Abstract. Tools that allow effective information organisation, access and navigation are becoming increasingly important on the Web. Sim-ilarity between web pages is a concept that is central to such tools. In this paper, we examine the effect that content and layout-related as-pects of web pages have on web page similarity. We consider the textual content contained within common HTML tags, the structural layout of pages, and the query terms contained within pages. Our study shows that combinations of factors can yield more promising results than individual factors, and that different aspects of web pages affect similarities between pages in a different manner. We found a number of factors that, when taken into account, can result in effect...
A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or ...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...
Abstract. When we describe a Web page informally, we often use phrases like \it looks like a newspap...
We present general-purpose methods for recognizing certain types of structure in HTML documents. The...
We present preliminary findings of a quantitative analysis of several attributes of Web page layout ...
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The c...
To find similar web pages to a query page on the Web, this paper introduces a novel link-based simil...
Finding and obtaining information efficiently from the Web is one of the important ele-ments in real...
To utilize the similarity information hidden in the Web graph, we investigate the problem of adaptiv...
As the number of web pages increases, search for useful information by users on web sites will becom...
This paper proposes a hyperlink-based web page similarity measurement and two matrix-based hierarchi...
In this paper, a novel approach is introduced to compare web sites by analysing their web page conte...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
This paper has been modified and extended from our prior work1, presented at IEEE ACIS/ICIS2010. The...
A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or ...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...
Abstract. When we describe a Web page informally, we often use phrases like \it looks like a newspap...
We present general-purpose methods for recognizing certain types of structure in HTML documents. The...
We present preliminary findings of a quantitative analysis of several attributes of Web page layout ...
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The c...
To find similar web pages to a query page on the Web, this paper introduces a novel link-based simil...
Finding and obtaining information efficiently from the Web is one of the important ele-ments in real...
To utilize the similarity information hidden in the Web graph, we investigate the problem of adaptiv...
As the number of web pages increases, search for useful information by users on web sites will becom...
This paper proposes a hyperlink-based web page similarity measurement and two matrix-based hierarchi...
In this paper, a novel approach is introduced to compare web sites by analysing their web page conte...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
This paper has been modified and extended from our prior work1, presented at IEEE ACIS/ICIS2010. The...
A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or ...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...
The World Wide Web provides a wealth of data that can be harnessed to help improve information retri...