Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of documents from sources like the World Wide Web, manual labeling by domain experts is prohibitively expensive. The problem of reducing effort in labeling of documents has warranted a lot of investigation in the past. Most of this work involved some kind of supervised or semi-supervised learning. This motivates the need to find automatic methods for annotating documents with labels. In this work we explore a novel method of assigning labels to documents without using any training data. The proposed method uses clustering to build semantically related sets that are us...
In text classification the amount and quality of training data is crucial for the performance of the...
In order to support the navigation in huge doc-ument collections efficiently, tagged hierarchical st...
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challe...
This paper introduces an approach to text classification for semi-structured label systems that...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
The purpose of text clustering in information retrieval is to discover groups of semantically relate...
In text classification the amount and quality of training data is crucial for the performance of the...
Because of the explosion of digital and online text information, automatic organization of documents...
Semi-supervised learning methods construct classifiers using both labeled and unlabeled training da...
Abstract. Semantic knowledge is important in many areas of natural language processing. We propose a...
Document classification is a large body of search, many approaches were proposed for single label an...
Document classification is a large body of search, many approaches were proposed for single label an...
In text classification the amount and quality of training data is crucial for the performance of the...
In text classification the amount and quality of training data is crucial for the performance of the...
In order to support the navigation in huge doc-ument collections efficiently, tagged hierarchical st...
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challe...
This paper introduces an approach to text classification for semi-structured label systems that...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
The purpose of text clustering in information retrieval is to discover groups of semantically relate...
In text classification the amount and quality of training data is crucial for the performance of the...
Because of the explosion of digital and online text information, automatic organization of documents...
Semi-supervised learning methods construct classifiers using both labeled and unlabeled training da...
Abstract. Semantic knowledge is important in many areas of natural language processing. We propose a...
Document classification is a large body of search, many approaches were proposed for single label an...
Document classification is a large body of search, many approaches were proposed for single label an...
In text classification the amount and quality of training data is crucial for the performance of the...
In text classification the amount and quality of training data is crucial for the performance of the...
In order to support the navigation in huge doc-ument collections efficiently, tagged hierarchical st...
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challe...