This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems. Our system identifies potentially wrong labels using a deep probabilistic model, which is able to infer the latent class of a high-dimensional data instance by exploiting data distributions in the underlying latent feature space. To minimize crowd efforts, it employs a data sampler which selects data instances that would benefit the most from being inspected by the crowd. The manually verified labels are then propagated to similar data instances in the original training data by exploiting the underlying data structure, thus scaling out the contribution from th...
Machine learning is a garbage-in-garbage-out system, which relies on high-quality labeled data to tr...
The coupling of machine intelligence and human intelligence has the potential to empower humans with...
While mislabeled or ambiguously-labeled samples in the training set could negatively affect the perf...
Thesis (Ph.D.)--University of Washington, 2017-08Artificial intelligence and machine learning power ...
In many domains, collecting sufficient labeled training data for supervised machine learning require...
Although supervised learning requires a labeled dataset, obtaining labels from experts is generally ...
Although supervised learning requires a labeled dataset, ob- taining labels from experts is generall...
With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of ...
Nowadays, crowdsourcing is being widely used to collect training data for solving classification pro...
The supervised learning-based recommendation models, whose infrastructures are sufficient training s...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
With the emergence of search engines and crowd-sourcing websites, machine learning practitioners are...
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled...
© 1992-2012 IEEE. There is an emerging trend to leverage noisy image datasets in many visual recogni...
Over the last few years, deep learning has revolutionized the field of machine learning by dramatica...
Machine learning is a garbage-in-garbage-out system, which relies on high-quality labeled data to tr...
The coupling of machine intelligence and human intelligence has the potential to empower humans with...
While mislabeled or ambiguously-labeled samples in the training set could negatively affect the perf...
Thesis (Ph.D.)--University of Washington, 2017-08Artificial intelligence and machine learning power ...
In many domains, collecting sufficient labeled training data for supervised machine learning require...
Although supervised learning requires a labeled dataset, obtaining labels from experts is generally ...
Although supervised learning requires a labeled dataset, ob- taining labels from experts is generall...
With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of ...
Nowadays, crowdsourcing is being widely used to collect training data for solving classification pro...
The supervised learning-based recommendation models, whose infrastructures are sufficient training s...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
With the emergence of search engines and crowd-sourcing websites, machine learning practitioners are...
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled...
© 1992-2012 IEEE. There is an emerging trend to leverage noisy image datasets in many visual recogni...
Over the last few years, deep learning has revolutionized the field of machine learning by dramatica...
Machine learning is a garbage-in-garbage-out system, which relies on high-quality labeled data to tr...
The coupling of machine intelligence and human intelligence has the potential to empower humans with...
While mislabeled or ambiguously-labeled samples in the training set could negatively affect the perf...