Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always be available to research workers. In this paper, we look into possibilities to automatically collect training data by sampling the Web with a set of given class names. The basic idea is to populate appropriate keywords and submit them as queries to search engines for acquiring training data. Two methods are presented in this study: One method is based on sampling the common concepts among the classes, and the other based on sampling the discriminative concepts for each class. A series of experiments were carried out independently on two different datasets, and the r...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
Web mining is a newly emerging r esearch area concerned with analyzing the World Wide Web. It is co...
The traditional techniques rely on human effort to acquire training sets, which is expensive and ine...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
As the digital age pushes forward, data and document size have been increasing rapidly. A more effic...
Abstract. A major difficulty of supervised approaches for text classification is that they require a...
Learning to rank has become a popular approach to build a ranking model for Web search recently. Bas...
International audienceThis paper adresses the problem of clustering dynamic collections of web docum...
This paper studies training set sampling strategies in the context of statistical learning for text ...
The traditional techniques rely on human effort to acquire training sets, which is expensive and ine...
In this paper a Web mining tool for content-based classification of Web pages is presented. The tool...
The world wide web has a wealth of information that is related to almost any text classification tas...
Many text databases on the web are hidden behind search interfaces, and their documents are only acc...
Abstract. The paper describes possible representation models and ways of weighting text documents, w...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
Web mining is a newly emerging r esearch area concerned with analyzing the World Wide Web. It is co...
The traditional techniques rely on human effort to acquire training sets, which is expensive and ine...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
As the digital age pushes forward, data and document size have been increasing rapidly. A more effic...
Abstract. A major difficulty of supervised approaches for text classification is that they require a...
Learning to rank has become a popular approach to build a ranking model for Web search recently. Bas...
International audienceThis paper adresses the problem of clustering dynamic collections of web docum...
This paper studies training set sampling strategies in the context of statistical learning for text ...
The traditional techniques rely on human effort to acquire training sets, which is expensive and ine...
In this paper a Web mining tool for content-based classification of Web pages is presented. The tool...
The world wide web has a wealth of information that is related to almost any text classification tas...
Many text databases on the web are hidden behind search interfaces, and their documents are only acc...
Abstract. The paper describes possible representation models and ways of weighting text documents, w...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
The (unheralded) first step in many applications of automated text analysis involves selecting keywo...
Web mining is a newly emerging r esearch area concerned with analyzing the World Wide Web. It is co...