Machine learning often relies on costly labeled data, which impedes its application to new classification and information extraction problems. This motivates the development of methods that leverage our abundant prior knowledge about these problems in learning. Several recently proposed methods incorporate prior knowledge with constraints on the expectations of a probabilistic model. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample of the unlabeled data for the practitioner to ins...
In the literature of supervised learning, most existing studies assume that the labels provided by t...
Many applications of supervised machine learning consist of training data with a large number of fea...
This paper addresses the problem of learn-ing when high-quality labeled examples are an expensive re...
Machine learning often relies on costly labeled data, and this impedes its application to new classi...
In a real-world application of supervised learning, we have a training set of examples with labels, ...
Uncertainty sampling methods iteratively request class labels for training instances whose classes a...
In the passive, traditional, approach to learning, the information available to the learner is a set...
Given k pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide ...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for examp...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
How to sample training/validation data is an important question for machine learning models, especia...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
Classification is an important task in many fields including biomedical research and ma-chine learni...
Thesis (Ph.D.)--Boston UniversityIn a typical discriminative learning setting, a set of labeled trai...
In the literature of supervised learning, most existing studies assume that the labels provided by t...
Many applications of supervised machine learning consist of training data with a large number of fea...
This paper addresses the problem of learn-ing when high-quality labeled examples are an expensive re...
Machine learning often relies on costly labeled data, and this impedes its application to new classi...
In a real-world application of supervised learning, we have a training set of examples with labels, ...
Uncertainty sampling methods iteratively request class labels for training instances whose classes a...
In the passive, traditional, approach to learning, the information available to the learner is a set...
Given k pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide ...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for examp...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
How to sample training/validation data is an important question for machine learning models, especia...
A constant challenge to researchers is the lack of large and timely datasets of domain examples (res...
Classification is an important task in many fields including biomedical research and ma-chine learni...
Thesis (Ph.D.)--Boston UniversityIn a typical discriminative learning setting, a set of labeled trai...
In the literature of supervised learning, most existing studies assume that the labels provided by t...
Many applications of supervised machine learning consist of training data with a large number of fea...
This paper addresses the problem of learn-ing when high-quality labeled examples are an expensive re...