Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets,...
Building machine learning models for natural language understanding (NLU) tasks relies heavily on la...
Unsupervised learning text representations aims at converting natural languages into vector represen...
Learning to construct text representations in end-to-end systems can be difficult, as natural langua...
Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract...
Text classification is one of the most fundamental tasks in Natural Language Processing. How to effe...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
The best performing NLP models to date are learned from large volumes of manually-annotated data. Fo...
Based on recent advances in natural language modeling and those in text generation capabilities, we ...
Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language mode...
LDK 2017: Language, Data and Knowledge, Galway Ireland, 19-20 June 2017Semi-supervised algorithms ha...
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We fo...
document are those of the author and should not be interpreted as representing the official policies...
We study the problem of weakly supervised text classification, which aims to classify text documents...
Text classification plays a fundamental role in transforming unstructured text data to structured kn...
NLP models learn sentence representations for downstream tasks by tuning a model which is pre-traine...
Building machine learning models for natural language understanding (NLU) tasks relies heavily on la...
Unsupervised learning text representations aims at converting natural languages into vector represen...
Learning to construct text representations in end-to-end systems can be difficult, as natural langua...
Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract...
Text classification is one of the most fundamental tasks in Natural Language Processing. How to effe...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
The best performing NLP models to date are learned from large volumes of manually-annotated data. Fo...
Based on recent advances in natural language modeling and those in text generation capabilities, we ...
Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language mode...
LDK 2017: Language, Data and Knowledge, Galway Ireland, 19-20 June 2017Semi-supervised algorithms ha...
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We fo...
document are those of the author and should not be interpreted as representing the official policies...
We study the problem of weakly supervised text classification, which aims to classify text documents...
Text classification plays a fundamental role in transforming unstructured text data to structured kn...
NLP models learn sentence representations for downstream tasks by tuning a model which is pre-traine...
Building machine learning models for natural language understanding (NLU) tasks relies heavily on la...
Unsupervised learning text representations aims at converting natural languages into vector represen...
Learning to construct text representations in end-to-end systems can be difficult, as natural langua...