When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous mon...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....
Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label du...
One of the biggest challenges of employing supervised deep learning approaches is their inability to...
Recent interest in the external validity of prediction models (i.e., the problem of different train ...
Monitoring machine learning models once they are deployed is challenging. It is even more challengin...
When monitoring machine learning systems, two-sample tests of homogeneity form the foundation upon w...
This paper studies model transferability when human decision subjects respond to a deployed machine ...
We introduce a framework for calibrating machine learning models so that their predictions satisfy e...
A common use case of machine learning in real world settings is to learn a model from historical dat...
Machine learning algorithms typically assume that training and test examples are drawn from the same...
The performance of machine learning models under distribution shift has been the focus of the commun...
Designing robust models is critical for reliable deployment of artificial intelligence systems. Deep...
Deployed machine learning (ML) models often encounter new user data that differs from their training...
In machine learning, we traditionally evaluate the performance of a single model, averaged over a co...
When the test distribution differs from the training distribution, machine learning models can perfo...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....
Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label du...
One of the biggest challenges of employing supervised deep learning approaches is their inability to...
Recent interest in the external validity of prediction models (i.e., the problem of different train ...
Monitoring machine learning models once they are deployed is challenging. It is even more challengin...
When monitoring machine learning systems, two-sample tests of homogeneity form the foundation upon w...
This paper studies model transferability when human decision subjects respond to a deployed machine ...
We introduce a framework for calibrating machine learning models so that their predictions satisfy e...
A common use case of machine learning in real world settings is to learn a model from historical dat...
Machine learning algorithms typically assume that training and test examples are drawn from the same...
The performance of machine learning models under distribution shift has been the focus of the commun...
Designing robust models is critical for reliable deployment of artificial intelligence systems. Deep...
Deployed machine learning (ML) models often encounter new user data that differs from their training...
In machine learning, we traditionally evaluate the performance of a single model, averaged over a co...
When the test distribution differs from the training distribution, machine learning models can perfo...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....
Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label du...
One of the biggest challenges of employing supervised deep learning approaches is their inability to...