With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may naturally jeopardize the outputs of machine learning algorithms, and selection bias issues are now the subject of much attention in the literature. The present article investigates how to extend Empirical Risk Minimization, the principal paradigm in statistical learning, when training observations are generated from biased models, i.e., from distributions that are different from that in the test/prediction stage, and absolutely continuous with respect to the latter. Precisely, we show how to build a "nea...
Access to a representative sample from the population is an assumption that underpins all of machine...
From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Ma...
In many machine learning domains, misclassification costs are different for different examples, in t...
Abstract The generalization ability of minimizers of the empirical risk in the context of binary cla...
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for examp...
In survey methodology, inverse probability weighted (Horvitz-Thompson) estimation has become an indi...
I describe an exploration criterion that attempts to minimize the error of a learner by minimizing...
Machine learning algorithms are celebrated for their impressive performance on many tasksthat we tho...
We derive a family of loss functions to train models in the presence of sampling bias. Examples are ...
Biased data represents a significant challenge for the proper functioning of machine learning models...
Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepres...
International audienceIn certain situations that shall be undoubtedly more and more common in the Bi...
We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks s...
We address the practical construction of asymptotic confidence intervals for smooth (i.e., pathwise ...
We have considered the problem in which a biased sample is selected from a finite population, and th...
Access to a representative sample from the population is an assumption that underpins all of machine...
From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Ma...
In many machine learning domains, misclassification costs are different for different examples, in t...
Abstract The generalization ability of minimizers of the empirical risk in the context of binary cla...
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for examp...
In survey methodology, inverse probability weighted (Horvitz-Thompson) estimation has become an indi...
I describe an exploration criterion that attempts to minimize the error of a learner by minimizing...
Machine learning algorithms are celebrated for their impressive performance on many tasksthat we tho...
We derive a family of loss functions to train models in the presence of sampling bias. Examples are ...
Biased data represents a significant challenge for the proper functioning of machine learning models...
Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepres...
International audienceIn certain situations that shall be undoubtedly more and more common in the Bi...
We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks s...
We address the practical construction of asymptotic confidence intervals for smooth (i.e., pathwise ...
We have considered the problem in which a biased sample is selected from a finite population, and th...
Access to a representative sample from the population is an assumption that underpins all of machine...
From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Ma...
In many machine learning domains, misclassification costs are different for different examples, in t...