Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance: The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value. A Murphy curve shows a forecast's mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast...
This paper presents a score that can be used for evaluating probabilistic forecasts of multicategory...
This article refers to the study of Mason and Weigel, where the generalized discrimination score D h...
The evaluation of forecast performance plays a central role both in the interpretation and use of fo...
Forecasts are issued as point or probabilistic predictions, and their performance is measured using ...
Probabilistic forecasts of variables measured on a categorical or ordinal scale, such as precipitati...
The reliability diagram is a common diagnostic graph used to summarize and evaluate probabilistic fo...
Throughout science and technology, receiver operating characteristic (ROC) curves and associated are...
In this work, we investigate the reliability of the probabilistic binary forecast. We mathematically...
Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that...
Probabilistic forecasts in the form of probability distributions over future events have become popu...
Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but wit...
Motivated by the Basel 3 regulations, recent studies have considered joint forecasts of Value-at-Ris...
There are two popular statistical approaches to biomarker evaluation. One models the risk of disea...
We propose the use of signal detection theory (SDT) to evaluate the performance of both probabilisti...
This review article addresses the ROC curve and its advantage over the odds ratio to measure the ass...
This paper presents a score that can be used for evaluating probabilistic forecasts of multicategory...
This article refers to the study of Mason and Weigel, where the generalized discrimination score D h...
The evaluation of forecast performance plays a central role both in the interpretation and use of fo...
Forecasts are issued as point or probabilistic predictions, and their performance is measured using ...
Probabilistic forecasts of variables measured on a categorical or ordinal scale, such as precipitati...
The reliability diagram is a common diagnostic graph used to summarize and evaluate probabilistic fo...
Throughout science and technology, receiver operating characteristic (ROC) curves and associated are...
In this work, we investigate the reliability of the probabilistic binary forecast. We mathematically...
Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that...
Probabilistic forecasts in the form of probability distributions over future events have become popu...
Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but wit...
Motivated by the Basel 3 regulations, recent studies have considered joint forecasts of Value-at-Ris...
There are two popular statistical approaches to biomarker evaluation. One models the risk of disea...
We propose the use of signal detection theory (SDT) to evaluate the performance of both probabilisti...
This review article addresses the ROC curve and its advantage over the odds ratio to measure the ass...
This paper presents a score that can be used for evaluating probabilistic forecasts of multicategory...
This article refers to the study of Mason and Weigel, where the generalized discrimination score D h...
The evaluation of forecast performance plays a central role both in the interpretation and use of fo...