To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particular...
The process of clustering documents in a manner which produces accurate and compact clusters becomes...
Abstract With the rapid growth of text documents, document clustering technique is emerging for effi...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
AbstractMost of existing web page clustering algorithms is based on short and uneven snippets of web...
Documents clustering based on frequent itemsets can be regarded a new method of documents clustering...
Abstract—Web Content Mining generally consist of mining the content held within the web pages. The a...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
In this paper an approach that is using evolving, incremental (on-line) clustering to automatically ...
Abstract—As resources become more and more available on the Web, so the difficulties associated with...
This thesis presents new methods for classification and thematic grouping of billions of web pages, ...
Most state-of-the art document clustering methods are modifications of traditional clustering algor...
ABSTRACT: In this paper an approach that is using evolving, incremental (on-line) clustering to auto...
The dynamic web has increased exponentially over the past few years with more than thousands of docu...
The chapter provides a survey of some clustering methods relevant to the clustering document collect...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
The process of clustering documents in a manner which produces accurate and compact clusters becomes...
Abstract With the rapid growth of text documents, document clustering technique is emerging for effi...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
AbstractMost of existing web page clustering algorithms is based on short and uneven snippets of web...
Documents clustering based on frequent itemsets can be regarded a new method of documents clustering...
Abstract—Web Content Mining generally consist of mining the content held within the web pages. The a...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
In this paper an approach that is using evolving, incremental (on-line) clustering to automatically ...
Abstract—As resources become more and more available on the Web, so the difficulties associated with...
This thesis presents new methods for classification and thematic grouping of billions of web pages, ...
Most state-of-the art document clustering methods are modifications of traditional clustering algor...
ABSTRACT: In this paper an approach that is using evolving, incremental (on-line) clustering to auto...
The dynamic web has increased exponentially over the past few years with more than thousands of docu...
The chapter provides a survey of some clustering methods relevant to the clustering document collect...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
The process of clustering documents in a manner which produces accurate and compact clusters becomes...
Abstract With the rapid growth of text documents, document clustering technique is emerging for effi...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...