Abstract Background We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate? Results We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of ...
Motivation:Measurements are commonly taken from two phenotypes to build a classifier, where the numb...
“The curse of dimensionality ” is pertinent to many learning algorithms, and it denotes the drastic ...
Advances in microarray technology have equipped researchers to measure gene expression levels simult...
Background: Data generated using ‘omics’ technologies are characterized by high dimensionality, wher...
The main objective of this paper is to investigate the relationship between the size of training sam...
Classification studies with high-dimensional measurements and relatively small sample sizes are incr...
The datasets that appear in publications are curated and have been split into training, testing and ...
It is common to split a dataset into training and testing sets before fitting a statistical or machi...
In biometric practice, researchers often apply a large number of different methods in a "trial-and-e...
Abstract Background The goal of class prediction studies is to develop rules to accurately predict t...
Abstract. We propose a novel approach for the estimation of the size of training sets that are neede...
This thesis concerns the development and mathematical analysis of statistical procedures for classi...
Abstract Background In biometric practice, researchers often apply a large number of different metho...
High dimensional data is the situation in which the number of variables included in an analysis appr...
Supervised machine learning methods typically require splitting data into multiple chunks for traini...
Motivation:Measurements are commonly taken from two phenotypes to build a classifier, where the numb...
“The curse of dimensionality ” is pertinent to many learning algorithms, and it denotes the drastic ...
Advances in microarray technology have equipped researchers to measure gene expression levels simult...
Background: Data generated using ‘omics’ technologies are characterized by high dimensionality, wher...
The main objective of this paper is to investigate the relationship between the size of training sam...
Classification studies with high-dimensional measurements and relatively small sample sizes are incr...
The datasets that appear in publications are curated and have been split into training, testing and ...
It is common to split a dataset into training and testing sets before fitting a statistical or machi...
In biometric practice, researchers often apply a large number of different methods in a "trial-and-e...
Abstract Background The goal of class prediction studies is to develop rules to accurately predict t...
Abstract. We propose a novel approach for the estimation of the size of training sets that are neede...
This thesis concerns the development and mathematical analysis of statistical procedures for classi...
Abstract Background In biometric practice, researchers often apply a large number of different metho...
High dimensional data is the situation in which the number of variables included in an analysis appr...
Supervised machine learning methods typically require splitting data into multiple chunks for traini...
Motivation:Measurements are commonly taken from two phenotypes to build a classifier, where the numb...
“The curse of dimensionality ” is pertinent to many learning algorithms, and it denotes the drastic ...
Advances in microarray technology have equipped researchers to measure gene expression levels simult...