Abstract. Over the years many models had been proposed for text catego-rization. One of the most widely applied is the vector space model, assuming independence between indexing terms. Since training corpora sizes are rel-atively small – compared to ∞ – the generalization power of the learning algorithms is relatively low. Using a bigger unannotated text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch we use Wikipedia articles to give word dis-tributional representation for documents. Since this causes dimensionality increase, some feature clustering is needed. For this end we use LSA. 1
Document classification is a key task for many text min-ing applications. However, traditional text ...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very ...
We study an approach to text categorization that combines distributional clustering of words and a S...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
The exponential growth of text documents available on the Internet has created an urgent need for ac...
Unsupervised learning text representations aims at converting natural languages into vector represen...
Classification We propose a new algorithm for dimensionality reduction and unsupervised text classif...
In traditional text clustering methods, documents are represented as “bags of words ” without consid...
We present an approach to text categorization using machine learning techniques. The approach is dev...
In this paper we present a novel approach to learning semantic models for multiple domains, which we...
This dissertation introduces a new theoretical model for text classification systems, including syst...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
In practice, machine learning systems deal with multiple datasets over time. When the feature spaces...
Text categorization, or the assignment of natural language texts to predefined categories based on t...
Document classification is a key task for many text min-ing applications. However, traditional text ...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very ...
We study an approach to text categorization that combines distributional clustering of words and a S...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
The exponential growth of text documents available on the Internet has created an urgent need for ac...
Unsupervised learning text representations aims at converting natural languages into vector represen...
Classification We propose a new algorithm for dimensionality reduction and unsupervised text classif...
In traditional text clustering methods, documents are represented as “bags of words ” without consid...
We present an approach to text categorization using machine learning techniques. The approach is dev...
In this paper we present a novel approach to learning semantic models for multiple domains, which we...
This dissertation introduces a new theoretical model for text classification systems, including syst...
This paper focuses on a comparative evaluation of a wide-range of text categorization methods, inclu...
In practice, machine learning systems deal with multiple datasets over time. When the feature spaces...
Text categorization, or the assignment of natural language texts to predefined categories based on t...
Document classification is a key task for many text min-ing applications. However, traditional text ...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very ...
We study an approach to text categorization that combines distributional clustering of words and a S...