This paper addresses the problem of semi-supervised classification on document collections using retraining (also called self-training). A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. Such an approach is by itself not robust and faces tuning problems regarding parameters like the number of selected documents, the number of retraining iterations, and the ratio of positive and negative classified samples used for retraining. The paper develops methods for automatically tuning these parameters, based on predicting the leave-one-out error for a re-trained classifier...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
Automatic document classification and clustering are useful for a wide range of applications such as...
Training a system for pattern recognition is a task that require a large amount of labeled data. Ho...
This paper addresses the problem of semi-supervised classification on document collections using ret...
This paper addresses the problem of semi-supervised classification on document collections using ret...
This paper addresses the problem of semi-supervised classification on document collections using re...
Web includes digital libraries and billions of text documents. A fast and simple search through this...
Abstract Co-training is a semi-supervised technique that allows classifiers to learn with fewer labe...
The creation of training set, for pattern recognition, is a difficult, expensive and time consuming ...
Abstract—Classification of text remains a challenge. Most machine learning based approaches require ...
Abstract. A major difficulty of supervised approaches for text classification is that they require a...
The continuous increase of digital documents on the web creates the need to search for information p...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
Master's thesis in Computer scienceSemi-supervised learning defines the techniques that fall in betw...
Abstract—Practical machine learning and data mining prob-lems often face shortage of labeled trainin...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
Automatic document classification and clustering are useful for a wide range of applications such as...
Training a system for pattern recognition is a task that require a large amount of labeled data. Ho...
This paper addresses the problem of semi-supervised classification on document collections using ret...
This paper addresses the problem of semi-supervised classification on document collections using ret...
This paper addresses the problem of semi-supervised classification on document collections using re...
Web includes digital libraries and billions of text documents. A fast and simple search through this...
Abstract Co-training is a semi-supervised technique that allows classifiers to learn with fewer labe...
The creation of training set, for pattern recognition, is a difficult, expensive and time consuming ...
Abstract—Classification of text remains a challenge. Most machine learning based approaches require ...
Abstract. A major difficulty of supervised approaches for text classification is that they require a...
The continuous increase of digital documents on the web creates the need to search for information p...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
Master's thesis in Computer scienceSemi-supervised learning defines the techniques that fall in betw...
Abstract—Practical machine learning and data mining prob-lems often face shortage of labeled trainin...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
Automatic document classification and clustering are useful for a wide range of applications such as...
Training a system for pattern recognition is a task that require a large amount of labeled data. Ho...