The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources. In this paper, we present a two-phase scalable model-based clustering framework: First, a large data set is summed up into sub-clusters; Then, clusters are directly generated from the summary statistics of sub-clusters by a specifically designed Expectation-Maximization (EM) algorithm. Taking example for Gaussian mixture models, we establish a provably convergent EM algorithm, EMADS, which embodies cardinality, mean, and covariance information of each sub-cluster explicitly. Combining with different data summarization procedures, EMADS is used to construct two clustering systems: gEMADS and bEMADS. Th...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Finding clusters in data is a challenging problem especially when the clusters are being of widely v...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
The scalability problem in data mining involves the development of methods for handling large databa...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
: Practical statistical data clustering algorithms require multiple data scans to converge. For lar...
Abstract In this paper we propose an efficient and fast EM algorithm for model-based clustering of l...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. I...
Cluster analysis in a large dataset is an interesting challenge in many fields of Science and Engine...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
International audienceClustering is impacted by the regular increase of sample sizes which provides ...
Clustering very large datasets while preserving cluster quality remains a challenging data-mining ta...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Finding clusters in data is a challenging problem especially when the clusters are being of widely v...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
The scalability problem in data mining involves the development of methods for handling large databa...
In this paper, we propose EMACF (Expectation- Maximization Algorithm for Clustering Features) to gen...
We present two scalable model-based clustering systems based on a Gaussian mixture model with indepe...
: Practical statistical data clustering algorithms require multiple data scans to converge. For lar...
Abstract In this paper we propose an efficient and fast EM algorithm for model-based clustering of l...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. I...
Cluster analysis in a large dataset is an interesting challenge in many fields of Science and Engine...
We introduce a new class of “maximization expectation ” (ME) algorithms where we maximize over hidde...
The Expectation-Maximization (EM) algorithm is a very popular optimization tool in model-based clust...
International audienceClustering is impacted by the regular increase of sample sizes which provides ...
Clustering very large datasets while preserving cluster quality remains a challenging data-mining ta...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Finding clusters in data is a challenging problem especially when the clusters are being of widely v...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...