Access to a representative sample from the population is an assumption that underpins all of machine learning. Selection effects can cause observations to instead come from a subpopulation, by which our inferences may be subject to bias. It is therefore important to know whether or not a sample is affected by selection effects. We study under which conditions we can identify selection bias and give results for both parametric and non-parametric families of distributions. Based on these results we develop two practical methods to determine whether or not an observed sample comes from a distribution subject to selection bias. Through extensive evaluation on synthetic and real world data we verify that our methods beat the state of the art bo...
We consider the scenario where training and test data are drawn from different distributions, common...
Most statistical methods assume that samples are representative of a target population of interest, ...
Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout t...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/151805/1/rssc12371_am.pdfhttps://deepb...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
none1noThis thesis presents a creative and practical approach to dealing with the problem of selecti...
Selection bias is caused by preferential exclusion of units from the samples and represents a major ...
Objectives: Spurious associations between an exposure and outcome not describing the causal estimand...
We show with a simulation that nonrepresentative sampling of two discrete fitness classes leads to b...
We show with a simulation that nonrepresentative sampling of two discrete fitness classes leads to b...
Abstract. This paper presents a theoretical analysis of sample selection bias cor-rection. The sampl...
Accurately measuring discrimination in machine learning-based automated decision systems is required...
We consider the scenario where training and test data are drawn from different distributions, common...
We consider the scenario where training and test data are drawn from different distributions, common...
Most statistical methods assume that samples are representative of a target population of interest, ...
Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout t...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/151805/1/rssc12371_am.pdfhttps://deepb...
People often extrapolate from data samples, inferring properties of the population like the rate of ...
none1noThis thesis presents a creative and practical approach to dealing with the problem of selecti...
Selection bias is caused by preferential exclusion of units from the samples and represents a major ...
Objectives: Spurious associations between an exposure and outcome not describing the causal estimand...
We show with a simulation that nonrepresentative sampling of two discrete fitness classes leads to b...
We show with a simulation that nonrepresentative sampling of two discrete fitness classes leads to b...
Abstract. This paper presents a theoretical analysis of sample selection bias cor-rection. The sampl...
Accurately measuring discrimination in machine learning-based automated decision systems is required...
We consider the scenario where training and test data are drawn from different distributions, common...
We consider the scenario where training and test data are drawn from different distributions, common...
Most statistical methods assume that samples are representative of a target population of interest, ...
Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout t...