In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the da...
We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-spe...
When deployed in the real world, machine learning models inevitably encounter changes in the data di...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...
In a well-calibrated risk prediction model, the average predicted probability is close to the true e...
We introduce a framework for calibrating machine learning models so that their predictions satisfy e...
In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to...
Moderate calibration, the expected event probability among observations with predicted probability z...
Fair calibration is a widely desirable fairness criteria in risk prediction contexts. One way to mea...
Objective: Calibrated risk models are vital for valid decision support. We define four levels of cal...
The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confi...
Probability predictions from binary regressions or machine learning methods ought to be calibrated: ...
A long noted difficulty when assessing calibration (or reliability) of forecasting systems is that c...
Learning probabilistic classification and prediction models that generate accurate probabilities is ...
With model trustworthiness being crucial for sensitive real-world applications, practitioners are pu...
<p>Risk prediction models can translate genetic association findings for clinical decision-making. M...
We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-spe...
When deployed in the real world, machine learning models inevitably encounter changes in the data di...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...
In a well-calibrated risk prediction model, the average predicted probability is close to the true e...
We introduce a framework for calibrating machine learning models so that their predictions satisfy e...
In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to...
Moderate calibration, the expected event probability among observations with predicted probability z...
Fair calibration is a widely desirable fairness criteria in risk prediction contexts. One way to mea...
Objective: Calibrated risk models are vital for valid decision support. We define four levels of cal...
The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confi...
Probability predictions from binary regressions or machine learning methods ought to be calibrated: ...
A long noted difficulty when assessing calibration (or reliability) of forecasting systems is that c...
Learning probabilistic classification and prediction models that generate accurate probabilities is ...
With model trustworthiness being crucial for sensitive real-world applications, practitioners are pu...
<p>Risk prediction models can translate genetic association findings for clinical decision-making. M...
We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-spe...
When deployed in the real world, machine learning models inevitably encounter changes in the data di...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...