This paper studies training set sampling strategies in the context of statistical learning for text cate-gorization. It is argued sampling strategies favoring common categories is superior to uniform coverage or mistake-driven approaches, if performance is mea-sured by globally assessed precision and recall. The hypothesis is empirically validated by examining the performance of a nearest neighbor classifier on training samples drawn from a pool of 235,401 training texts with 29,741 distinct categories. The learning curves of the classifier are analyzed with respect to the choice of training resources, the sampling methods, the size, vocabulary and category coverage of a sample, and the category distribution over the texts in the sam-ple. A...
Kilimci, Zeynep Hilal (Dogus Author) -- Akyokuş, Selim (Dogus Author) -- Conference full title: 2016...
In order to train a classifier that generalizes well, different learning problems, in particu-lar hi...
Abstract—In text categorization (TC), which is a supervised technique, a feature vector of terms or ...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
As the digital age pushes forward, data and document size have been increasing rapidly. A more effic...
International audienceWe address the problem of multi-class classification in the case where the num...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
k is the most important parameter in a text categorization system based on the k-nearest neighbor al...
AbstractConsider the pattern recognition problem of learning multicategory classification from a lab...
Abstract. In order to reduce human efforts, there has been increasing interest in applying active le...
We present an approach to text categorization using machine learning techniques. The approach is dev...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
Text classification involves deciding whether or not a document is about a given topic. It is an imp...
In the field of Natural Language Processing, supervised machine learning is commonly used to solve c...
Graduation date: 2000We developed and investigated machine learning methods that require\ud minimal ...
Kilimci, Zeynep Hilal (Dogus Author) -- Akyokuş, Selim (Dogus Author) -- Conference full title: 2016...
In order to train a classifier that generalizes well, different learning problems, in particu-lar hi...
Abstract—In text categorization (TC), which is a supervised technique, a feature vector of terms or ...
59 p.In this thesis, an algorithm is presented that selects samples of documents for training text c...
As the digital age pushes forward, data and document size have been increasing rapidly. A more effic...
International audienceWe address the problem of multi-class classification in the case where the num...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
k is the most important parameter in a text categorization system based on the k-nearest neighbor al...
AbstractConsider the pattern recognition problem of learning multicategory classification from a lab...
Abstract. In order to reduce human efforts, there has been increasing interest in applying active le...
We present an approach to text categorization using machine learning techniques. The approach is dev...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
Text classification involves deciding whether or not a document is about a given topic. It is an imp...
In the field of Natural Language Processing, supervised machine learning is commonly used to solve c...
Graduation date: 2000We developed and investigated machine learning methods that require\ud minimal ...
Kilimci, Zeynep Hilal (Dogus Author) -- Akyokuş, Selim (Dogus Author) -- Conference full title: 2016...
In order to train a classifier that generalizes well, different learning problems, in particu-lar hi...
Abstract—In text categorization (TC), which is a supervised technique, a feature vector of terms or ...