Spurred by recent advances on the theoretical analysis of the performances of the data-driven machine learning algorithms, this thesis tackles the performance analysis and improvement of high dimensional data and graph clustering. Specifically, in the first bigger part of the thesis, using advanced tools from random matrix theory, the performance analysis of spectral methods on dense realistic graph models and on high dimensional kernel random matrices is performed through the study of the eigenvalues and eigenvectors of the similarity matrices characterizing those data. New improved methods are proposed and are shown to outperform state-of-the-art approaches. In a second part, a new algorithm is proposed for the detection of heterogeneous ...