This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for isotropic Gaussian mixture models (GMMs). Further developments of the algorithm for GMMs with diagonal covariance matrices (instead of isotropic clusters) and their corresponding benchmarking results have been published by TPAMI (doi:10.1109/TPAMI.2021.3133763) in the paper "A Variational EM Acceleration for Efficient Clustering at Very Large Scales". We kindly refer the reader to the TPAMI paper instead of this much earlier arXiv version (the TPAMI paper is also open access). Publicly available source code accompanies the paper (see https://github.com/variational-sublinear-clustering). Please note that the TPAMI paper does not contain the b...
Clustering has been a subject of extensive research in data mining, pattern recognition, and other a...
The first part of this thesis is concerned with Sparse Clustering, which assumes that a potentially ...
Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clusterin...
peer reviewedMotivated by the poor performance (linear complexity) of the EM algorithm in clustering...
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clu...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
Clustering conceptually reveals all its interest when the dataset size considerably increases since ...
We propose a new Gaussian clustering method named EM-FDA for feature extraction in high dimensional ...
The scalability problem in data mining involves the development of methods for handling large databa...
peer reviewedWe present a variational Expectation-Maximization algorithm to learn probabilistic mixt...
The Expectation-Maximization (EM) algorithm is a popular and convenient tool for the estimation of G...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two...
Clustering has been a subject of extensive research in data mining, pattern recognition, and other a...
The first part of this thesis is concerned with Sparse Clustering, which assumes that a potentially ...
Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clusterin...
peer reviewedMotivated by the poor performance (linear complexity) of the EM algorithm in clustering...
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clu...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
Clustering conceptually reveals all its interest when the dataset size considerably increases since ...
We propose a new Gaussian clustering method named EM-FDA for feature extraction in high dimensional ...
The scalability problem in data mining involves the development of methods for handling large databa...
peer reviewedWe present a variational Expectation-Maximization algorithm to learn probabilistic mixt...
The Expectation-Maximization (EM) algorithm is a popular and convenient tool for the estimation of G...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two...
Clustering has been a subject of extensive research in data mining, pattern recognition, and other a...
The first part of this thesis is concerned with Sparse Clustering, which assumes that a potentially ...
Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clusterin...