In lexicon-based classification, documents are assigned labels by comparing the number of words that appear from two opposed lexicons, such as positive and negative sentiment. Creating such words lists is often easier than labeling instances, and they can be debugged by non-experts if classification performance is unsatisfactory. However, there is little analysis or justification of this classification heuristic. This paper describes a set of assumptions that can be used to derive a probabilistic justification for lexicon-based classification, as well as an analysis of its expected accuracy. One key assumption behind lexicon-based classification is that all words in each lexicon are equally predictive. This is rarely true in practice, which...
Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data...
Large-scale social media classification faces the following two challenges: algorithms can be hard t...
document are those of the author and should not be interpreted as representing the official policies...
The automated classification of text documents is an active research challenge in document-oriented ...
Text examples must be exploited in the acquisition of lexical structures. However, neither syntactic...
Abstract. The task of text classification is the assignment of labels that describe texts ’ char-act...
Ambiguity resolution in the parsing of natural language requires a vast repository of knowledge to g...
This paper demonstrates how unsupervised techniques can be used to learn models of deep linguistic s...
This paper analyzes distributional properties that facilitate the categorization of words into lexic...
For sentiment analysis, we address the problem of supervised-learning being domain-dependent. Additi...
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. Th...
We propose a novel method for counting sentiment orientation that outperforms supervised learning ap...
Certain common lexical attributes such as polarity and formality are continuous, creating chal-lenge...
Text classification techniques such as Bayesian classifiers have been proved to be giving as good or...
In many important text classification problems, acquiring class labels for training documents is cos...
Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data...
Large-scale social media classification faces the following two challenges: algorithms can be hard t...
document are those of the author and should not be interpreted as representing the official policies...
The automated classification of text documents is an active research challenge in document-oriented ...
Text examples must be exploited in the acquisition of lexical structures. However, neither syntactic...
Abstract. The task of text classification is the assignment of labels that describe texts ’ char-act...
Ambiguity resolution in the parsing of natural language requires a vast repository of knowledge to g...
This paper demonstrates how unsupervised techniques can be used to learn models of deep linguistic s...
This paper analyzes distributional properties that facilitate the categorization of words into lexic...
For sentiment analysis, we address the problem of supervised-learning being domain-dependent. Additi...
In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. Th...
We propose a novel method for counting sentiment orientation that outperforms supervised learning ap...
Certain common lexical attributes such as polarity and formality are continuous, creating chal-lenge...
Text classification techniques such as Bayesian classifiers have been proved to be giving as good or...
In many important text classification problems, acquiring class labels for training documents is cos...
Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data...
Large-scale social media classification faces the following two challenges: algorithms can be hard t...
document are those of the author and should not be interpreted as representing the official policies...