OBJECTIVE: Increasing the awareness of how incomplete data affects learning and classification accuracy has led to increasing numbers of missing data techniques. This paper investigates the robustness and accuracy of seven popular techniques for tolerating incomplete training and test data for different patters of missing data; different proportions and mechanisms of missing data on resulting tree-based models. METHOD: The seven missing data techniques were compared by artificially simulating different proportions, patterns, and mechanisms of missing data using twenty one complete (i.e. with no missing values) datasets obtained from the UCI repository of machine learning databases [Blake and Merz, 1998]. A 4-way repeated measures design was...
The substitution of missing values, also called imputation, is an important data preparation task fo...
Missing data is common in real-world studies and can create issues in statistical inference. Discard...
When exploring missing data techniques in a realistic scenario, the current literature is limited: m...
© 2015 Elsevier Inc. The goal is to investigate the prediction performance of tree-based techniques ...
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge rep...
There are many different missing data methods used by classification tree algorithms, but few studie...
There are many different missing data methods used by classification tree algorithms, but few studie...
In real-life situations, we often encounter data sets containing missing observations. Statistical m...
In many application settings, the data have missing entries which make analysis challenging. An abun...
In many application settings, the data have missing entries which make analysis challenging. An abun...
In many application settings, the data have missing entries which make analysis challenging. An abun...
There are many different missing data methods used by classification tree algorithms, but few studie...
We propose a simple and effective method for dealing with missing data in decision trees used for cl...
Much work has studied the effect of different treatments of missing values on model induction, but l...
Missing data is common in real-world studies and can create issues in statistical inference. Discard...
The substitution of missing values, also called imputation, is an important data preparation task fo...
Missing data is common in real-world studies and can create issues in statistical inference. Discard...
When exploring missing data techniques in a realistic scenario, the current literature is limited: m...
© 2015 Elsevier Inc. The goal is to investigate the prediction performance of tree-based techniques ...
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge rep...
There are many different missing data methods used by classification tree algorithms, but few studie...
There are many different missing data methods used by classification tree algorithms, but few studie...
In real-life situations, we often encounter data sets containing missing observations. Statistical m...
In many application settings, the data have missing entries which make analysis challenging. An abun...
In many application settings, the data have missing entries which make analysis challenging. An abun...
In many application settings, the data have missing entries which make analysis challenging. An abun...
There are many different missing data methods used by classification tree algorithms, but few studie...
We propose a simple and effective method for dealing with missing data in decision trees used for cl...
Much work has studied the effect of different treatments of missing values on model induction, but l...
Missing data is common in real-world studies and can create issues in statistical inference. Discard...
The substitution of missing values, also called imputation, is an important data preparation task fo...
Missing data is common in real-world studies and can create issues in statistical inference. Discard...
When exploring missing data techniques in a realistic scenario, the current literature is limited: m...