Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should). In this paper we argue that the replicability of a test is also of importance. We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it. We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets. Based on ou...
Most research in statistical learning (SL) has focused on the mean success rates of participants in ...
The assessment of the performance of learners by means of benchmark experiments is an established ex...
Statistical significance tests are the main tool that IR practitioners use to determine the reliabil...
Empirical research in learning algorithms for classification tasks generally requires the use of sig...
Replicability of machine learning experiments measures how likely it is that the outcome of one expe...
Abstract. Statistical signicance tests are often used in machine learning to compare the per-formanc...
This paper reviews five statistical tests for determining whether one learning algorithm outperforms...
Significance testing has become a mainstay in machine learning, with the p value being firmly embedd...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
This article reviews five approximate statistical tests for determining whether one learning algorit...
There is a well-known problem in Null Hypothesis Significance Testing: many statistically significan...
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discov...
Much research has been done in the fields of classifier performance evaluation and optimization. Thi...
Practitioners of data mining and machine learning have long observed that the imbalance of classes i...
The assessment of the performance of learners by means of benchmark experiments is an established ex...
Most research in statistical learning (SL) has focused on the mean success rates of participants in ...
The assessment of the performance of learners by means of benchmark experiments is an established ex...
Statistical significance tests are the main tool that IR practitioners use to determine the reliabil...
Empirical research in learning algorithms for classification tasks generally requires the use of sig...
Replicability of machine learning experiments measures how likely it is that the outcome of one expe...
Abstract. Statistical signicance tests are often used in machine learning to compare the per-formanc...
This paper reviews five statistical tests for determining whether one learning algorithm outperforms...
Significance testing has become a mainstay in machine learning, with the p value being firmly embedd...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
This article reviews five approximate statistical tests for determining whether one learning algorit...
There is a well-known problem in Null Hypothesis Significance Testing: many statistically significan...
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discov...
Much research has been done in the fields of classifier performance evaluation and optimization. Thi...
Practitioners of data mining and machine learning have long observed that the imbalance of classes i...
The assessment of the performance of learners by means of benchmark experiments is an established ex...
Most research in statistical learning (SL) has focused on the mean success rates of participants in ...
The assessment of the performance of learners by means of benchmark experiments is an established ex...
Statistical significance tests are the main tool that IR practitioners use to determine the reliabil...