Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable...
In recent decades, the availability of a large amount of data has propelled the field of machine lea...
Manual and high-quality annotation of social media data has enabled companies and researchers to dev...
This paper describes our approach to the SemEval 2016 task 4, “Sentiment Analysis in Twitter”, where...
Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noi...
Crowdsourcing has become an popular approach for annotating the large quantities of data required to...
With the proliferation of social media, gathering data has became cheaper and easier than before. Ho...
Recognizing human activities from wearable sensor data is an important problem, particularly for hea...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
ii With the proliferation of social media, gathering data has became cheaper and easier than before....
Elections unleash strong political views on Twitter, but what do peoplereally think about politics? ...
This paper studies the active learning problem in crowdsourcing settings, where multiple imperfect a...
This repository contains the manuscript of my Ph.D. dissertation. Here is the abstract of the manusc...
This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine int...
In recent decades, the availability of a large amount of data has propelled the field of machine lea...
Manual and high-quality annotation of social media data has enabled companies and researchers to dev...
This paper describes our approach to the SemEval 2016 task 4, “Sentiment Analysis in Twitter”, where...
Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noi...
Crowdsourcing has become an popular approach for annotating the large quantities of data required to...
With the proliferation of social media, gathering data has became cheaper and easier than before. Ho...
Recognizing human activities from wearable sensor data is an important problem, particularly for hea...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
ii With the proliferation of social media, gathering data has became cheaper and easier than before....
Elections unleash strong political views on Twitter, but what do peoplereally think about politics? ...
This paper studies the active learning problem in crowdsourcing settings, where multiple imperfect a...
This repository contains the manuscript of my Ph.D. dissertation. Here is the abstract of the manusc...
This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine int...
In recent decades, the availability of a large amount of data has propelled the field of machine lea...
Manual and high-quality annotation of social media data has enabled companies and researchers to dev...
This paper describes our approach to the SemEval 2016 task 4, “Sentiment Analysis in Twitter”, where...