Reducing features to improve bug prediction

Shivaji, Shivkumar
Whitehead, E. James James
Akella, Ram
Kim, Sunghun

Publication date

January 2009

DOI

10.1109/ASE.2009.76

Abstract

Recently, machine learning classifiers have emerged as a way to predict the existence of a bug in a change made to a source code file. The classifier is first trained on software history data, and then used to predict bugs. Two drawbacks of existing classifier-based bug prediction are potentially insufficient accuracy for practical use, and use of a large number of features. These large numbers of features adversely impact scalability and accuracy of the approach. This paper proposes a feature selection technique applicable to classification-based bug prediction. This technique is applied to predict bugs in software changes, and performance of Naïve Bayes and Support Vector Machine (SVM) classifiers is characterized. © 2009 IEEE