This paper explores the use of a statistical technique known as density estimation to potentially improve the results of text categorization systems which label documents by computing similarities between documents and categories. In addition to potentially improving a system's overall accuracy, density estimation converts similarity scores to probabilities. These probabilities provide confidence measures for a system's predictions which are easily interpretable and could potentially help to combine results of various systems. We discuss the results of three complete experiments on three separate data sets applying density estimation to the results of a TF*IDF/Rocchio system, and we compare these results to those of many competing approache...
We study an approach to text categorization that combines distributional clustering of words and a S...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...
This paper introduces a term weighting method for text categorization based on smoothing ideas borro...
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning meth...
Supervised text categorization is a machine learning task where a predefined category label is autom...
The authors apply the state of the art techniques from machine learning and statistics to reconceptu...
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learni...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
Text classification is the task of assigning predefined categories to free text documents. Due to th...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Text data mining is the process of extracting and analyzing valuable information from text. A text d...
In the information age we much depend on our ability to find information hidden in mostly unstructur...
This dissertation introduces a new theoretical model for text classification systems, including syst...
Within text categorization and other data mining tasks, the use of suitable methods for term weighti...
A new approach to the Text Categorization problem is here presented. It is called Gaussian Weighti...
We study an approach to text categorization that combines distributional clustering of words and a S...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...
This paper introduces a term weighting method for text categorization based on smoothing ideas borro...
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning meth...
Supervised text categorization is a machine learning task where a predefined category label is autom...
The authors apply the state of the art techniques from machine learning and statistics to reconceptu...
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learni...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
Text classification is the task of assigning predefined categories to free text documents. Due to th...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Text data mining is the process of extracting and analyzing valuable information from text. A text d...
In the information age we much depend on our ability to find information hidden in mostly unstructur...
This dissertation introduces a new theoretical model for text classification systems, including syst...
Within text categorization and other data mining tasks, the use of suitable methods for term weighti...
A new approach to the Text Categorization problem is here presented. It is called Gaussian Weighti...
We study an approach to text categorization that combines distributional clustering of words and a S...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...
This paper introduces a term weighting method for text categorization based on smoothing ideas borro...