We consider the task of classification in the high dimensional setting where the number of features of the given data is significantly greater than the number of observations. To accomplish this task, we propose a heuristic, called sparse zero-variance discriminant analysis, for simultaneously performing linear discriminant analysis and feature selection on high dimensional data. This method combines classical zero-variance discriminant analysis, where discriminant vectors are identified in the null space of the sample within-class covariance matrix, with penalization applied to induce sparse structures in the resulting vectors. To approximately solve the resulting nonconvex problem, we develop a simple algorithm based on the alternating di...