AbstractMost of existing web page clustering algorithms is based on short and uneven snippets of web page, which often cause bad clustering performance. On the other hand, the classical clustering algorithm for full text web pages is too complex to provide good cluster label in addition to the incapability on-line clustering. To address above problems, this article presents an on-line web page clustering algorithm based on maximal frequent item sets (MFIC). At first, the maximal frequent item sets are mined, and then the web pages are clustered based on shared frequent item sets. Secondly, clusters are labelled based on the frequent items. Experimental results show that MFIC can effectively reduce clustering time, improve clustering accurac...
Clustering is currently more and more applied on hyperlinked documents, especially for web search re...
The clustering of topic-related web pages has been recognized as a foundational work in exploiting l...
We propose a system that clusters web pages and presents them as a hierarchical structure instead of...
To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a ...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
Documents clustering based on frequent itemsets can be regarded a new method of documents clustering...
As the storage capacity and the processing speed of search engine is growing to keep up with the con...
Clustering is the process of organizing objects into groups whose members are similar in some way. I...
Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each...
Abstract—As resources become more and more available on the Web, so the difficulties associated with...
A cluster is a gathering of similar objects which can exhibit dissimilarity to the objects of other ...
In this paper, we propose a system that clusters web pages and presents them as a hierarchical struc...
With the increase in information on the World Wide Web it has become difficult to find desired infor...
Abstract—Web Content Mining generally consist of mining the content held within the web pages. The a...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
Clustering is currently more and more applied on hyperlinked documents, especially for web search re...
The clustering of topic-related web pages has been recognized as a foundational work in exploiting l...
We propose a system that clusters web pages and presents them as a hierarchical structure instead of...
To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a ...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
Documents clustering based on frequent itemsets can be regarded a new method of documents clustering...
As the storage capacity and the processing speed of search engine is growing to keep up with the con...
Clustering is the process of organizing objects into groups whose members are similar in some way. I...
Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each...
Abstract—As resources become more and more available on the Web, so the difficulties associated with...
A cluster is a gathering of similar objects which can exhibit dissimilarity to the objects of other ...
In this paper, we propose a system that clusters web pages and presents them as a hierarchical struc...
With the increase in information on the World Wide Web it has become difficult to find desired infor...
Abstract—Web Content Mining generally consist of mining the content held within the web pages. The a...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
Clustering is currently more and more applied on hyperlinked documents, especially for web search re...
The clustering of topic-related web pages has been recognized as a foundational work in exploiting l...
We propose a system that clusters web pages and presents them as a hierarchical structure instead of...