Reducing Features to Improve Code Change-Based Bug Prediction

Shivaji, Shivkumar
Whitehead, Jr., E. James
Akella, Ram
Kim, Sunghun

Open link

Publication date

January 2013

DOI

10.1109/TSE.2012.43

ISSN

0098-5589

Journal

issn:0098-5589

Abstract

Machine learning classifiers have recently emerged as a way to predict the introduction of bugs in changes made to source code files. The classifier is first trained on software history, and then used to predict if an impending change causes a bug. Drawbacks of existing classifier-based bug prediction techniques are insufficient performance for practical use and slow prediction times due to a large number of machine learned features. This paper investigates multiple feature selection techniques that are generally applicable to classification-based bug prediction methods. The techniques discard less important features until optimal classification performance is reached. The total number of features used for training is substantially reduced,...