Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of empirical methods for categorization of new articles within text corpora: unsupervised learning for an unlabeled corpus of text documents and supervised learning for hand-labeled corpus. The goal of text categorization is to organize natural language (i.e. human language) documents into categories that are either predefined or that are inherently grouped by similar meaning. The first approach, automatic classification of texts, can be handy when handling massive amounts of data and has many applications such as automated indexing of scientific articles, spam filtering, classification of news articles etc. Classification using supervised or s...