One recently proposed criterion to separate two datasets in discriminant analysis, is to use a hyperplane which minimises the sum of distances to it from all the misclassified data points. Here all distances are supposed to be measured by way of some fixed norm, while misclassification means lying on the wrong side of the hyperplane, or rather in the wrong halfspace. In this paper we study the problem of determining such an optimal halfspace when points are distributed according to an arbitrary random vector X in Rd,. In the unconstrained case in dimension d, we prove that any optimal separating halfspace always balances the misclassified points. Moreover, under polyhedrality assumptions on the support of X, there always exists an optimal...