In many applications, training data is provided in the form of related datasets obtained from several sources, which typically affects the sample distribution. The learned classification models, which are expected to perform well on similar data coming from new sources, often suffer due to bias introduced by what we call `spurious' samples -- those due to source characteristics and not representative of any other part of the data. As standard outlier detection and robust classification usually fall short of determining groups of spurious samples, we propose a procedure which identifies the common structure across datasets by minimizing a multi-dataset divergence metric, increasing accuracy for new datasets
We consider prediction based on a main model. When the main model shares partial parameters with sev...
In machine learning, predictors trained on a given data distribution are usually guaranteed to perfo...
Abstract Existing out‐of‐distribution detection models rely on the prediction of a single classifier...
In several scientific applications, data are generated from two or more diverse sources (views) with...
Training machine leaning algorithms on augmented data fromdifferent related sources is a challenging...
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to t...
Discriminative learning methods for classification perform well when training and test data are draw...
This thesis describes novel approaches to the problem of outlier detection. It is one of the most im...
Training deep learning models for time-series prediction of a target population often requires a sub...
This paper aims at characterizing classification problems to find the main features that determine t...
Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107.09044v2) involve ...
Domain adaptation (DA), which leverages labeled data from related source domains, comes in handy whe...
Domain adaptation (DA), which leverages labeled data from related source domains, comes in handy whe...
A familiar problem in machine learning is to determine which data points are outliers when the unde...
Outliers usually spread across regions of low density. However, due to the absence or scarcity of ou...
We consider prediction based on a main model. When the main model shares partial parameters with sev...
In machine learning, predictors trained on a given data distribution are usually guaranteed to perfo...
Abstract Existing out‐of‐distribution detection models rely on the prediction of a single classifier...
In several scientific applications, data are generated from two or more diverse sources (views) with...
Training machine leaning algorithms on augmented data fromdifferent related sources is a challenging...
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to t...
Discriminative learning methods for classification perform well when training and test data are draw...
This thesis describes novel approaches to the problem of outlier detection. It is one of the most im...
Training deep learning models for time-series prediction of a target population often requires a sub...
This paper aims at characterizing classification problems to find the main features that determine t...
Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107.09044v2) involve ...
Domain adaptation (DA), which leverages labeled data from related source domains, comes in handy whe...
Domain adaptation (DA), which leverages labeled data from related source domains, comes in handy whe...
A familiar problem in machine learning is to determine which data points are outliers when the unde...
Outliers usually spread across regions of low density. However, due to the absence or scarcity of ou...
We consider prediction based on a main model. When the main model shares partial parameters with sev...
In machine learning, predictors trained on a given data distribution are usually guaranteed to perfo...
Abstract Existing out‐of‐distribution detection models rely on the prediction of a single classifier...