Webpage categorization has turned out to be an important topic in recent years. In a webpage, text is usually the main content, so that auto text categorization (ATC) becomes the key technique to such a task. For Chinese text categorization as well as Chinese webpage categorization, one of the basic and urgent problems is the construction of a good benchmark corpus. In this study, a machine learning approach is presented to refine a corpus for Chinese webpage categorization, where the AdaBoost algorithm is adopted to identify outliers in the corpus. The standard k nearest neighbor (kNN) algorithm under a vector space model (VSM) is adopted to construct a webpage categorization system. Simulation results as well as manual investigation of th...
Website categorization has recently emerged as a very important task in several contexts. A huge amo...
As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With thi...
为了有效地组织和分析海量的Web信息,文章应用有指导的机器学习方法实现了一个中文网页分类器,并应用该分类器在"天网"搜索引擎上实现了大规模中文网页的目录导航服务.实...
Webpage categorization has turned out to be an important topic in recent years. In a webpage, text i...
Text categorization is one of the typical machine learning tasks that suffer from an incomplete trai...
Abstract:- Web filtering is an inductive process which automatically builds a filter by learning the...
对基于中文的Web文本分类技术进行了研究,介绍了web文本分类的基本过程和Web文本预处理及文本特征选取的方法,重点介绍了一种常用的基于内容的分类算法KNN。最后通过实验测试了使用KNN算法的中文We...
Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. ...
Abstract. This paper reports our comparative evaluation of three machine learning methods, namely k ...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
To improve the precision of search engine and locate user-interesting Web page promptly, an investig...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
Web pages are discriminated based on their topic and genre. Web page genres are capable to improve t...
An improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese lan...
The automatic categorisation of web documents is be-coming crucial for organising the huge amount of...
Website categorization has recently emerged as a very important task in several contexts. A huge amo...
As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With thi...
为了有效地组织和分析海量的Web信息,文章应用有指导的机器学习方法实现了一个中文网页分类器,并应用该分类器在"天网"搜索引擎上实现了大规模中文网页的目录导航服务.实...
Webpage categorization has turned out to be an important topic in recent years. In a webpage, text i...
Text categorization is one of the typical machine learning tasks that suffer from an incomplete trai...
Abstract:- Web filtering is an inductive process which automatically builds a filter by learning the...
对基于中文的Web文本分类技术进行了研究,介绍了web文本分类的基本过程和Web文本预处理及文本特征选取的方法,重点介绍了一种常用的基于内容的分类算法KNN。最后通过实验测试了使用KNN算法的中文We...
Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. ...
Abstract. This paper reports our comparative evaluation of three machine learning methods, namely k ...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
To improve the precision of search engine and locate user-interesting Web page promptly, an investig...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
Web pages are discriminated based on their topic and genre. Web page genres are capable to improve t...
An improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese lan...
The automatic categorisation of web documents is be-coming crucial for organising the huge amount of...
Website categorization has recently emerged as a very important task in several contexts. A huge amo...
As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With thi...
为了有效地组织和分析海量的Web信息,文章应用有指导的机器学习方法实现了一个中文网页分类器,并应用该分类器在"天网"搜索引擎上实现了大规模中文网页的目录导航服务.实...