<p>The full dataset is a gene expression matrix with 8,000 features (the genes) as rows and 30 samples (the patients) as columns. For each patient, the outcome (poor or good) is given (1). The dataset is randomly divided into a training and a test set (2). Within the training set, genes are ranked by how different they are between patients with poor and good outcome (3). The most different genes are selected (4). They are used to train a machine learning classifier on the training set (5). After training, the classifier is asked to predict the outcome of the test set patients (6). The predicted outcome is compared with the true outcome and the number of correctly classified patients is noted (7). Steps 2–7 are repeated 1,000 times, and the ...