Abstract—Empirical studies on software prediction models do not converge with respect to the question “which prediction model is best? ” The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some dou...
<p>The experiment was conducted 10 times using 10-fold cross-validation performed on the training se...
A potential methodological problem with empirical studies that assess project effort prediction syst...
Many empirical software engineering studies have employed feature selection algorithms to exclude th...
Empirical studies on software prediction models do not converge with respect to the question "which ...
The need for accurate software prediction systems increases as software becomes much larger and more...
Most reported experience with software reliability models is from a project's testing phases, d...
Context Software engineering has a problem in that when we empirically evaluate competing predict...
BACKGROUND: Prediction e.g. of project cost is an important concern in software engineering. PROBLEM...
Model validation methods (e.g., k-fold cross-validation) use historical data to predict how well an ...
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to t...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
Comparative simulation studies are workhorse tools for benchmarking statistical methods, but if not ...
Many models have been proposed for software reliability prediction, but none of these models could c...
Evaluation of predictive models is a ubiquitous task in machine learning and data mining. Cross-vali...
Statistical methods based on a regression model plus a zero-mean Gaussian process (GP) have been wi...
<p>The experiment was conducted 10 times using 10-fold cross-validation performed on the training se...
A potential methodological problem with empirical studies that assess project effort prediction syst...
Many empirical software engineering studies have employed feature selection algorithms to exclude th...
Empirical studies on software prediction models do not converge with respect to the question "which ...
The need for accurate software prediction systems increases as software becomes much larger and more...
Most reported experience with software reliability models is from a project's testing phases, d...
Context Software engineering has a problem in that when we empirically evaluate competing predict...
BACKGROUND: Prediction e.g. of project cost is an important concern in software engineering. PROBLEM...
Model validation methods (e.g., k-fold cross-validation) use historical data to predict how well an ...
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to t...
The mean result of machine learning models is determined by utilizing k-fold cross-validation. The a...
Comparative simulation studies are workhorse tools for benchmarking statistical methods, but if not ...
Many models have been proposed for software reliability prediction, but none of these models could c...
Evaluation of predictive models is a ubiquitous task in machine learning and data mining. Cross-vali...
Statistical methods based on a regression model plus a zero-mean Gaussian process (GP) have been wi...
<p>The experiment was conducted 10 times using 10-fold cross-validation performed on the training se...
A potential methodological problem with empirical studies that assess project effort prediction syst...
Many empirical software engineering studies have employed feature selection algorithms to exclude th...