We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
In multivariate datasets, multiple clustering solutions can be obtained, based on different subsets ...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
The scalability problem in data mining involves the development of methods for handling large databa...
The scalability problem in data mining involves the development of methods for handling large databa...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
Abstract: In this paper deals with clustering models based on the Gaussian Mixtures. Parameters are ...
We present an algorithm for generating a mixture model from a data set by converting the data into a...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
The use of clustering systems is very important in those real-word applications where an efficient, ...
Agglomerative hierarchical clustering methods based on Gaussian probability models have recently sho...
Abstract. Agglomerative hierarchical clustering methods based on Gaussian probability models have re...
Finite mixture models are being increasingly used to model the distributions of a wide variety of ra...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
In multivariate datasets, multiple clustering solutions can be obtained, based on different subsets ...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
The scalability problem in data mining involves the development of methods for handling large databa...
The scalability problem in data mining involves the development of methods for handling large databa...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
Abstract: In this paper deals with clustering models based on the Gaussian Mixtures. Parameters are ...
We present an algorithm for generating a mixture model from a data set by converting the data into a...
Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaus...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
The use of clustering systems is very important in those real-word applications where an efficient, ...
Agglomerative hierarchical clustering methods based on Gaussian probability models have recently sho...
Abstract. Agglomerative hierarchical clustering methods based on Gaussian probability models have re...
Finite mixture models are being increasingly used to model the distributions of a wide variety of ra...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
In multivariate datasets, multiple clustering solutions can be obtained, based on different subsets ...