We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in addition to textual contents. In this paper, we explore the potential of using metadata to help weakly supervised text classification. To be specific, we model the relationships between documents and metadata via a heterogeneous informa...
This paper reports on an on-going research project to create educational semantic metadata out of fo...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Text Categorization (TC) is the automatic classification of text documents under pre-defined categor...
Document categorization, which aims to assign a topic label to each document, plays a fundamental ro...
Text classification plays a fundamental role in transforming unstructured text data to structured kn...
Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of ...
Multi-label text classification refers to the problem of assigning each given document its most rele...
Solving text classification in a weakly supervised manner is important for real-world applications w...
Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant la...
In this paper, we introduce a method for categoriz-ing digital items according to their topic, only ...
Deep neural networks are gaining increasing popularity for the classic text classification task, due...
Structured knowledge representations are becoming central to the area of Information Science. Search...
Thesis (Ph.D.)--University of Washington, 2013Text classification is a general and important machine...
In this paper we illustrate a system aimed at solving a longstanding and challenging problem: acquir...
The cluster assumption is exploited by most semi-supervised learning (SSL) meth-ods. However, if the...
This paper reports on an on-going research project to create educational semantic metadata out of fo...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Text Categorization (TC) is the automatic classification of text documents under pre-defined categor...
Document categorization, which aims to assign a topic label to each document, plays a fundamental ro...
Text classification plays a fundamental role in transforming unstructured text data to structured kn...
Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of ...
Multi-label text classification refers to the problem of assigning each given document its most rele...
Solving text classification in a weakly supervised manner is important for real-world applications w...
Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant la...
In this paper, we introduce a method for categoriz-ing digital items according to their topic, only ...
Deep neural networks are gaining increasing popularity for the classic text classification task, due...
Structured knowledge representations are becoming central to the area of Information Science. Search...
Thesis (Ph.D.)--University of Washington, 2013Text classification is a general and important machine...
In this paper we illustrate a system aimed at solving a longstanding and challenging problem: acquir...
The cluster assumption is exploited by most semi-supervised learning (SSL) meth-ods. However, if the...
This paper reports on an on-going research project to create educational semantic metadata out of fo...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Text Categorization (TC) is the automatic classification of text documents under pre-defined categor...