Most web page classifiers use features from the page content, which means that it has to be downloaded to be classified. We propose a technique to cluster web pages by means of their URL exclusively. In contrast to other proposals, we analyse features that are outside the page, hence, we do not need to download a page to classify it. Also, it is non-supervised, requiring little intervention from the user. Fur-thermore, we do not need to crawl extensively a site to build a classifier for that site, but only a small subset of pages. We have performed an experiment over 21 highly visited web-sites to evaluate the performance of our classifier, obtaining good precision and recall results.Junta de Andalucía P08-TIC-4100Ministerio de C...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Unsupervised web page classification refers to the problem of clustering the pages in a web site so ...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
Clustering is the process of organizing objects into groups whose members are similar in some way. I...
With the growth of web-based applications and the increasedpopularity of the World Wide Web (WWW), t...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Web page clustering is a focal task in Web Mining to organize the content of websites, understanding...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Web page clustering is a focal task in Web Mining to organize the content of websites, understanding...
Web page classification refers to the problem of automatically assigning a web page to one or morecl...
Clustering is well suited for Web mining by automatically organizing Web pages into categories each ...
In this paper we investigate the effect of using clustering algorithms in the reverse engineering fi...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Unsupervised web page classification refers to the problem of clustering the pages in a web site so ...
With the increase in information on the World Wide Web it has become difficult to find the desired ...
Clustering is the process of organizing objects into groups whose members are similar in some way. I...
With the growth of web-based applications and the increasedpopularity of the World Wide Web (WWW), t...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Web page clustering is a focal task in Web Mining to organize the content of websites, understanding...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Web page clustering is a focal task in Web Mining to organize the content of websites, understanding...
Web page classification refers to the problem of automatically assigning a web page to one or morecl...
Clustering is well suited for Web mining by automatically organizing Web pages into categories each ...
In this paper we investigate the effect of using clustering algorithms in the reverse engineering fi...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...