Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
In 1988, Langley wrote an influential editorial in the journal Machine Learning titled \u201cMachine...
This article reviews five approximate statistical tests for determining whether one learning algorit...
The central question addressed in this research is ”can we define evaluation methodologies that enco...
Recent work in reinforcement learning has focused on several characteristics of learned policies tha...
Although being a crucial question for the development of machine learning algorithms, there is still...
This paper reviews five statistical tests for determining whether one learning algorithm outperforms...
Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic o...
In this paper we provide empirical data of the performance of the two most commonly used multiobject...
Marginalized importance sampling (MIS), which measures the density ratio between the state-action oc...
The main objective of this paper is to correct the unreasonable and inaccurate criticism to our prev...
Developing state-of-the-art approaches for specific tasks is a major driving force in our research c...
If we changed the rules, would the wise become fools? Different groups formalize reinforcement learn...
Sutton [in his PhD thesis] introduced a reinforcement comparison term into the equations governing c...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
In 1988, Langley wrote an influential editorial in the journal Machine Learning titled \u201cMachine...
This article reviews five approximate statistical tests for determining whether one learning algorit...
The central question addressed in this research is ”can we define evaluation methodologies that enco...
Recent work in reinforcement learning has focused on several characteristics of learned policies tha...
Although being a crucial question for the development of machine learning algorithms, there is still...
This paper reviews five statistical tests for determining whether one learning algorithm outperforms...
Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic o...
In this paper we provide empirical data of the performance of the two most commonly used multiobject...
Marginalized importance sampling (MIS), which measures the density ratio between the state-action oc...
The main objective of this paper is to correct the unreasonable and inaccurate criticism to our prev...
Developing state-of-the-art approaches for specific tasks is a major driving force in our research c...
If we changed the rules, would the wise become fools? Different groups formalize reinforcement learn...
Sutton [in his PhD thesis] introduced a reinforcement comparison term into the equations governing c...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
In 1988, Langley wrote an influential editorial in the journal Machine Learning titled \u201cMachine...