We report our efforts in identifying a set of previous humane valuations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting are production questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative)finding that the great majo...
International audienceAgainst the background of what has beentermed a reproducibility crisis in scie...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
One of the challenges in machine learning research is to ensure that presented and published result...
We report our efforts in identifying a set of previous humane valuations in NLP that would be suitab...
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitab...
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitab...
This paper reports results from a reproduction study in which we repeated the human evaluation of th...
Against a background of growing interest in reproducibility in NLP and ML, and as part of an ongoing...
Against the background of what has been termed a reproducibility crisis in science, the NLP field is...
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches a...
In this paper, we present the results of two re- production studies for the human evaluation origina...
In the last few years, the issue of reproducibility has gained increased attention in many scientifi...
This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA)...
Why are some research studies easy to reproduce while others are difficult? Casting doubt on the acc...
Reproducibility is of utmost concern in machine learning and natural language processing (NLP). In t...
International audienceAgainst the background of what has beentermed a reproducibility crisis in scie...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
One of the challenges in machine learning research is to ensure that presented and published result...
We report our efforts in identifying a set of previous humane valuations in NLP that would be suitab...
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitab...
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitab...
This paper reports results from a reproduction study in which we repeated the human evaluation of th...
Against a background of growing interest in reproducibility in NLP and ML, and as part of an ongoing...
Against the background of what has been termed a reproducibility crisis in science, the NLP field is...
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches a...
In this paper, we present the results of two re- production studies for the human evaluation origina...
In the last few years, the issue of reproducibility has gained increased attention in many scientifi...
This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA)...
Why are some research studies easy to reproduce while others are difficult? Casting doubt on the acc...
Reproducibility is of utmost concern in machine learning and natural language processing (NLP). In t...
International audienceAgainst the background of what has beentermed a reproducibility crisis in scie...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
One of the challenges in machine learning research is to ensure that presented and published result...