Content-related metadata plays an important role in the effort of developing intelligent web applications. One of the most established form of providing content-related metadata is the assignment of web-pages to content categories. We describe the Spectacle system for classifying individual web pages on the basis of their syntactic structure. This classification requires the spe-cification of classification rules associating common pa-ge structures with predefined classes. In this paper, we propose an approach for the automatic acquisition of these classification rules using techniques from inducti-ve logic programming and describe experiments in ap-plying the approach to an existing web-based informa-tion system
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
In data-intensive web sites pages are generated by scripts that embed data from a back-end database...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the...
This paper presents problem of automatic webpages classification using association rules based class...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
In this paper a Web mining tool for content-based classification of Web pages is presented. The tool...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
With the exponential growth of the World Wide Web, automated subject classification of Web pages has...
This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to so...
In this paper, the problem of classifying a HTML documents into a hierarchy of categories is invest...
In this thesis we have presented a solution to classify websites into geographical attribute code (N...
This work aims to use machine learning techniques for the classification of specific parts of web pa...
Gleim R, Mehler A, Dehmer M. Web corpus mining by instance of Wikipedia. In: Proceedings of the 11t...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
In data-intensive web sites pages are generated by scripts that embed data from a back-end database...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...
Content-related metadata plays an important role in the effort of developing intelligent web applica...
Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the...
This paper presents problem of automatic webpages classification using association rules based class...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
In this paper a Web mining tool for content-based classification of Web pages is presented. The tool...
In data-intensive web sites pages are generated by scripts that embed data from a backend database i...
With the exponential growth of the World Wide Web, automated subject classification of Web pages has...
This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to so...
In this paper, the problem of classifying a HTML documents into a hierarchy of categories is invest...
In this thesis we have presented a solution to classify websites into geographical attribute code (N...
This work aims to use machine learning techniques for the classification of specific parts of web pa...
Gleim R, Mehler A, Dehmer M. Web corpus mining by instance of Wikipedia. In: Proceedings of the 11t...
UnrestrictedThe World Wide Web has become one of the most important information resources today. Web...
In data-intensive web sites pages are generated by scripts that embed data from a back-end database...
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., program...