Much research on Machine Learning testing relies on empirical studies that evaluate and show their potential. However, in this context empirical results are sensitive to a number of parameters that can adversely impact the results of the experiments and potentially lead to wrong conclusions (Type I errors, i.e., incorrectly rejecting the Null Hypothesis). To this end, we survey the related literature and identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results. We then perform a sensitivity analysis on 30 influential studies that were published in top-tier SE venues, against our hazard set and demonstrate their criticality. Our findings indicate that all 10 hazards we identify have the pot...
Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new s...
Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application e...
With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, t...
For the last decade, deep learning (DL) has emerged as a new effective machine learning approach tha...
Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ab...
Context: Conducting experiments is central to research machine learning research to benchmark, evalu...
“Deep learning” uses Post-Selection—selection of a model after training multiple models using data. ...
Reliable and robust evaluation methods are a necessary first step towards developing machine learnin...
Deep Learning (DL) techniques help software developers thanks to their ability to learn from histori...
This data set contains the results of an extensive, systematic literature review on the use of machi...
Context: Deep learning has proven to be a valuable component in object detection and classification,...
Testing deep learning-based systems is crucial but challenging due to the required time and labor fo...
Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries...
Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in ...
In recent years, Deep Learning (DL) models have widely been applied to develop safety and security c...
Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new s...
Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application e...
With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, t...
For the last decade, deep learning (DL) has emerged as a new effective machine learning approach tha...
Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ab...
Context: Conducting experiments is central to research machine learning research to benchmark, evalu...
“Deep learning” uses Post-Selection—selection of a model after training multiple models using data. ...
Reliable and robust evaluation methods are a necessary first step towards developing machine learnin...
Deep Learning (DL) techniques help software developers thanks to their ability to learn from histori...
This data set contains the results of an extensive, systematic literature review on the use of machi...
Context: Deep learning has proven to be a valuable component in object detection and classification,...
Testing deep learning-based systems is crucial but challenging due to the required time and labor fo...
Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries...
Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in ...
In recent years, Deep Learning (DL) models have widely been applied to develop safety and security c...
Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new s...
Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application e...
With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, t...