(A) Workflow for the prediction of AET outcome based on genomic unitig diversity. The one-hot-encoded unitig data was divided into a training and a validation set stratified by the outcome proportions. Feature correlation filters were applied on training data to remove redundant features, thereby reducing the feature dimension from >500k to 4800. We used recursive feature elimination (RFE) in the nested cross validation (NCV) loops to further reduce the number of features in a model dependent manner. Separate modeling was done with no population structure control (nPSC) and population structure control (PSC), which was implemented by blocking the data based on BAPS groups. The feature combinations obtained during NCV were used to fit new pr...
(A) Area under the curve (AUC) values for the training and test dataset obtained during the PSC NCV ...
Compared to univariate analysis of genome-wide association (GWA) studies, machine learning–based mod...
BackgroundA random multiple-regression model that simultaneously fit all allele substitution effects...
<p>The full dataset is a gene expression matrix with 8,000 features (the genes) as rows and 30 sampl...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
A count matrix undergoes pre-processing, including normalization and filtering. The data is randomly...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
(A) Core genome phylogeny with tips colored according to AET outcome. The annotation rows correspond...
Additional file 1. Simulated (animal breeding) dataset. Includes four txt files: one for the groupin...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
Table S1 displays the effect of training set size on prediction ability when performing cross-valida...
In many application areas, predictive models are used to support or make important decisions. There ...
<p>For each combination of feature extraction method and secondary data source and each pair of data...
We fit 500 models with 500 subsets of 25 randomly selected features from the uncorrelated 4800 featu...
This record contains the training, test and validation datasets used to train and evaluate the machi...
(A) Area under the curve (AUC) values for the training and test dataset obtained during the PSC NCV ...
Compared to univariate analysis of genome-wide association (GWA) studies, machine learning–based mod...
BackgroundA random multiple-regression model that simultaneously fit all allele substitution effects...
<p>The full dataset is a gene expression matrix with 8,000 features (the genes) as rows and 30 sampl...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
A count matrix undergoes pre-processing, including normalization and filtering. The data is randomly...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
(A) Core genome phylogeny with tips colored according to AET outcome. The annotation rows correspond...
Additional file 1. Simulated (animal breeding) dataset. Includes four txt files: one for the groupin...
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic ...
Table S1 displays the effect of training set size on prediction ability when performing cross-valida...
In many application areas, predictive models are used to support or make important decisions. There ...
<p>For each combination of feature extraction method and secondary data source and each pair of data...
We fit 500 models with 500 subsets of 25 randomly selected features from the uncorrelated 4800 featu...
This record contains the training, test and validation datasets used to train and evaluate the machi...
(A) Area under the curve (AUC) values for the training and test dataset obtained during the PSC NCV ...
Compared to univariate analysis of genome-wide association (GWA) studies, machine learning–based mod...
BackgroundA random multiple-regression model that simultaneously fit all allele substitution effects...