Abstract In this paper we propose an efficient and fast EM algorithm for model-based clustering of large databases. Drawing ideas from its stochastic descendant, the Monte Carlo EM algorithm, the method uses only a sub-sample of the entire database per iteration. Starting with smaller samples in the earlier iterations for computational efficiency, the algorithm increase the sample size intelligently to-wards the end of the algorithm to assure maximum accuracy of the results. The intelligent sample size updating rule is centered around EM’s highly-appraised likelihood-ascent property and only increases the sample when no further im-provements are possible based on the current sample. In several simulation studies we show the superiority of A...
Address email Clustering is often formulated as the maximum likelihood estimation of a mixture model...
Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data s...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
In this paper we propose two new EM-type algorithms for model-based clustering. The first algorithm,...
Clustering is an important problem in Statistics and Machine Learning that is usually solved using L...
: Practical statistical data clustering algorithms require multiple data scans to converge. For lar...
The scalability problem in data mining involves the development of methods for handling large databa...
Clustering is a fundamental data mining technique. This article presents an improved EM algorithm to...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
Clustering is one of the most important techniques used in Data Mining. This article focuses on the ...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
The scalability problem in data mining involves the development of methods for handling large databa...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
Address email Clustering is often formulated as the maximum likelihood estimation of a mixture model...
Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data s...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
In this paper we propose two new EM-type algorithms for model-based clustering. The first algorithm,...
Clustering is an important problem in Statistics and Machine Learning that is usually solved using L...
: Practical statistical data clustering algorithms require multiple data scans to converge. For lar...
The scalability problem in data mining involves the development of methods for handling large databa...
Clustering is a fundamental data mining technique. This article presents an improved EM algorithm to...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
Clustering is one of the most important techniques used in Data Mining. This article focuses on the ...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
The scalability problem in data mining involves the development of methods for handling large databa...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
Address email Clustering is often formulated as the maximum likelihood estimation of a mixture model...
Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data s...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...