The Multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation of the Fisher Scoring as a learning algorithm, which is more robust to the choice of the initial parameter values. Moreover, we consider the generalization of the multinomial model obtained by introducing the Dirichlet as prior, which is called the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this thesis, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estim...
In this thesis we present an unsupervised algorithm for learning finite mixture models from multivar...
A notable rise in the amounts of data collected, which were made available to the public, is witness...
The Dirichlet process mixtures (DPM) can automatically infer the model complexity from data. Hence i...
31 pages, 10 figuresCount data is becoming more and more ubiquitous in a wide range of applications,...
In this work, we present an overdispersed count data clustering algorithm, which uses the mesh metho...
34 pages, 11 figuresInternational audienceCount data is becoming more and more ubiquitous in a wide ...
With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is u...
Online algorithms allow data instances to be processed in a sequential way, which is im-portant for ...
Count data often appears in natural language processing and computer vision applications. For exampl...
We have designed and implemented a finite mixture model, using the scaled Dirichlet distribution for...
Clustering is one of the fundamental tools for preliminary analysis of data. While most of the clust...
Many of the methods which deal with clustering in matrices of data are based on mathematical techniq...
Cluster analysis is concerned with partitioning cases into clusters such that the cases in a cluster...
<p>Clustering methods are designed to separate heterogeneous data into groups of similar objects suc...
The increased collection of high-dimensional data in various fields has raised a strong interest in ...
In this thesis we present an unsupervised algorithm for learning finite mixture models from multivar...
A notable rise in the amounts of data collected, which were made available to the public, is witness...
The Dirichlet process mixtures (DPM) can automatically infer the model complexity from data. Hence i...
31 pages, 10 figuresCount data is becoming more and more ubiquitous in a wide range of applications,...
In this work, we present an overdispersed count data clustering algorithm, which uses the mesh metho...
34 pages, 11 figuresInternational audienceCount data is becoming more and more ubiquitous in a wide ...
With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is u...
Online algorithms allow data instances to be processed in a sequential way, which is im-portant for ...
Count data often appears in natural language processing and computer vision applications. For exampl...
We have designed and implemented a finite mixture model, using the scaled Dirichlet distribution for...
Clustering is one of the fundamental tools for preliminary analysis of data. While most of the clust...
Many of the methods which deal with clustering in matrices of data are based on mathematical techniq...
Cluster analysis is concerned with partitioning cases into clusters such that the cases in a cluster...
<p>Clustering methods are designed to separate heterogeneous data into groups of similar objects suc...
The increased collection of high-dimensional data in various fields has raised a strong interest in ...
In this thesis we present an unsupervised algorithm for learning finite mixture models from multivar...
A notable rise in the amounts of data collected, which were made available to the public, is witness...
The Dirichlet process mixtures (DPM) can automatically infer the model complexity from data. Hence i...