In this paper, we introduce the evolving label-set prob-lem encountered in building real-world text classification systems. This problem arises when a text classification sys-tem trained on a label-set encounters documents of unseen classes at deployment time. We design a Class-Detector module attached to text classification systems that moni-tors unlabeled data, detects new classes, and suggests them to the human administrator for deployment in the original label-set. A central notion in our algorithms is the use of abstrac-tions that group together tokens under human understand-able concepts and also provide a mechanism of assigning importance to unseen terms. We present algorithms for se-lecting documents for a new class based on state-o...
document are those of the author and should not be interpreted as representing the official policies...
Text categorization is the classification to assign a text document to an appropriate category in a ...
In recent years, the exponential growth of digital documents has been met by rapid progress in text ...
We introduce the evolving label-set problem encountered in building real-world text classification s...
Text classification is an active research area motivated by many real-world applications. Even so, r...
In many important text classification problems, acquiring class labels for training documents is cos...
This paper introduces an approach to text classification for semi-structured label systems that...
The thesis studies the problem of multi-label text classification, and argues that it could benefit ...
Multi-label classification is a generalization of a broader concept of multi-class classification in...
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small...
Text Mining is the discovery of valuable, yet hidden, information from the text document. Text class...
We describe work on automatically assigning labels to books using user-defined tags as the label set...
Effective incorporation of human expertise, while exerting a low cognitive load, is a critical aspec...
Machine learning approaches to multi-label document classification have to date largely relied on di...
Web includes digital libraries and billions of text documents. A fast and simple search through this...
document are those of the author and should not be interpreted as representing the official policies...
Text categorization is the classification to assign a text document to an appropriate category in a ...
In recent years, the exponential growth of digital documents has been met by rapid progress in text ...
We introduce the evolving label-set problem encountered in building real-world text classification s...
Text classification is an active research area motivated by many real-world applications. Even so, r...
In many important text classification problems, acquiring class labels for training documents is cos...
This paper introduces an approach to text classification for semi-structured label systems that...
The thesis studies the problem of multi-label text classification, and argues that it could benefit ...
Multi-label classification is a generalization of a broader concept of multi-class classification in...
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small...
Text Mining is the discovery of valuable, yet hidden, information from the text document. Text class...
We describe work on automatically assigning labels to books using user-defined tags as the label set...
Effective incorporation of human expertise, while exerting a low cognitive load, is a critical aspec...
Machine learning approaches to multi-label document classification have to date largely relied on di...
Web includes digital libraries and billions of text documents. A fast and simple search through this...
document are those of the author and should not be interpreted as representing the official policies...
Text categorization is the classification to assign a text document to an appropriate category in a ...
In recent years, the exponential growth of digital documents has been met by rapid progress in text ...