Abstract. We propose a novel approach for the estimation of the size of training sets that are needed for constructing valid models in machine learning and data mining. We aim to provide a good representation of the underlying population without making any distributional assumptions. Our technique is based on the computation of the standard deviation of the χ2-statistics of a series of samples. When successive statistics are relatively close, we assume that the samples produced represent adequately the true underlying distribution of the population, and the models learned from these samples will behave almost as well as models learned on the entire population. We validate our results by experiments involving classifiers of various levels of...
Abstract Background We consider the problem of designing a study to develop a predictive classifier ...
Ensembles of machine learning models yield improved system performance as well as robust and interpr...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
The main objective of this paper is to investigate the relationship between the size of training sam...
We consider simulation studies on supervised learning which measure the performance of a classifica...
Learning models used for prediction purposes are mostly developed without paying much cognizance to ...
As machine learning gains significant attention in many disciplines and research communities, the v...
Many of today's large data sets must be reduced in size before invoking inductive algorithms, due to...
We study the interaction between input distributions, learning algorithms and finite sample sizes in...
We study the interaction between input distributions, learning algo-rithms, and finite sample sizes ...
For large, real-world inductive learning problems, the number of training examples often must be lim...
Determining the optimal amount of training data for machine learning algorithms is a critical task i...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
For each model and each size Ntr of the training data, we sampled 150 training data sets with input ...
Many simulation data sets are so massive that they must be distributed among disk farms attached to ...
Abstract Background We consider the problem of designing a study to develop a predictive classifier ...
Ensembles of machine learning models yield improved system performance as well as robust and interpr...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
The main objective of this paper is to investigate the relationship between the size of training sam...
We consider simulation studies on supervised learning which measure the performance of a classifica...
Learning models used for prediction purposes are mostly developed without paying much cognizance to ...
As machine learning gains significant attention in many disciplines and research communities, the v...
Many of today's large data sets must be reduced in size before invoking inductive algorithms, due to...
We study the interaction between input distributions, learning algorithms and finite sample sizes in...
We study the interaction between input distributions, learning algo-rithms, and finite sample sizes ...
For large, real-world inductive learning problems, the number of training examples often must be lim...
Determining the optimal amount of training data for machine learning algorithms is a critical task i...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
For each model and each size Ntr of the training data, we sampled 150 training data sets with input ...
Many simulation data sets are so massive that they must be distributed among disk farms attached to ...
Abstract Background We consider the problem of designing a study to develop a predictive classifier ...
Ensembles of machine learning models yield improved system performance as well as robust and interpr...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...