Text classification often faces the problem of imbalanced training data. This is true in sentiment analysis and particularly prominent in emotion classification where multiple emotion categories are very likely to produce naturally skewed training data. Different sampling methods have been proposed to improve classification performance by reducing the imbalance ratio between training classes. However, data sparseness and the small disjunct problem remain obstacles in generating new samples for minority classes when the data are skewed and limited. Methods to produce meaningful samples for smaller classes rather than simple duplication are essential in overcoming this problem. In this paper, we present an oversampling method based on word em...
International audienceCompositional data are a special kind of data, represented as a proportion car...
Our work analyzed the relationship between the domain type of the word embeddings used to create sen...
Word Embeddings are low-dimensional distributed representations that encompass a set of language mod...
Abstract. Imbalanced training data always puzzles the supervised learning based emotion and sentimen...
Since some sentiment words have similar syntactic and semantic features in the corpus, existing pre-...
Sentiment classification is an important task which gained extensive attention both in academia and ...
Word embeddings are effective intermediate representations for capturing semantic regularities betwe...
Semantic word spaces have been very use-ful but cannot express the meaning of longer phrases in a pr...
Context-based word embedding learning approaches can model rich semantic and syntactic information. ...
International audienceMost existing continuous word representation learning algorithms usually only ...
In this study, the problems caused by unbalanced data sets on sentiment analysis are discussed and t...
Moving beyond the dominant bag-of-words approach to sentiment analysis we introduce an alternative p...
We propose a novel method for enriching word-embeddings without the need of a labeled corpus. Instea...
Sentiment analysis is a well-known and rapidly expanding study topic in natural language processing ...
Moving beyond the dominant bag-of-words approach to sentiment analysis we introduce an alternative p...
International audienceCompositional data are a special kind of data, represented as a proportion car...
Our work analyzed the relationship between the domain type of the word embeddings used to create sen...
Word Embeddings are low-dimensional distributed representations that encompass a set of language mod...
Abstract. Imbalanced training data always puzzles the supervised learning based emotion and sentimen...
Since some sentiment words have similar syntactic and semantic features in the corpus, existing pre-...
Sentiment classification is an important task which gained extensive attention both in academia and ...
Word embeddings are effective intermediate representations for capturing semantic regularities betwe...
Semantic word spaces have been very use-ful but cannot express the meaning of longer phrases in a pr...
Context-based word embedding learning approaches can model rich semantic and syntactic information. ...
International audienceMost existing continuous word representation learning algorithms usually only ...
In this study, the problems caused by unbalanced data sets on sentiment analysis are discussed and t...
Moving beyond the dominant bag-of-words approach to sentiment analysis we introduce an alternative p...
We propose a novel method for enriching word-embeddings without the need of a labeled corpus. Instea...
Sentiment analysis is a well-known and rapidly expanding study topic in natural language processing ...
Moving beyond the dominant bag-of-words approach to sentiment analysis we introduce an alternative p...
International audienceCompositional data are a special kind of data, represented as a proportion car...
Our work analyzed the relationship between the domain type of the word embeddings used to create sen...
Word Embeddings are low-dimensional distributed representations that encompass a set of language mod...