This work concentrates on mining textual data. In particular, I apply Statistical Machine Learning to document clustering, predictive modeling, and document classification tasks undertaken in three different application domains. I have designed novel statistical Bayesian models for each application domain, as well as derived Markov Chain Monte Carlo (MCMC) algorithms for the model inference. First, I investigate the usefulness of using topic models, such as the popular Latent Dirichlet Allocation (LDA) and its extensions, as a pre-processing feature selection step for unsupervised document clustering. Documents are clustered using the pro- portion of the various topics that are present in each document; the topic proportion vectors are the...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Generative models based on the multivariate Bernoulli and multinomial distributions have been widely...
Recently, a probabilistic topic modelling approach, latent dirichlet allocation (LDA), has been exte...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
In recent years probabilistic topic models have gained tremendous attention in data mining and natur...
In recent years probabilistic topic models have gained tremendous attention in data mining and natur...
Topic modeling is a type of statistical model for discovering the latent "topics" that occur in a co...
Abstract Graphical models have become the basic framework for topic based probabilistic modeling. Es...
Recent developments in topic modeling for text corpora have incorporated Markov models in the latent...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Probabilistic topic models are machine learning tools for processing and understanding large text d...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of d...
The proliferation of large electronic document archives requires new techniques for automatically an...
In this paper we propose a probabilistic model for online document clustering. We use non-parametric...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Generative models based on the multivariate Bernoulli and multinomial distributions have been widely...
Recently, a probabilistic topic modelling approach, latent dirichlet allocation (LDA), has been exte...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
In recent years probabilistic topic models have gained tremendous attention in data mining and natur...
In recent years probabilistic topic models have gained tremendous attention in data mining and natur...
Topic modeling is a type of statistical model for discovering the latent "topics" that occur in a co...
Abstract Graphical models have become the basic framework for topic based probabilistic modeling. Es...
Recent developments in topic modeling for text corpora have incorporated Markov models in the latent...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Probabilistic topic models are machine learning tools for processing and understanding large text d...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of d...
The proliferation of large electronic document archives requires new techniques for automatically an...
In this paper we propose a probabilistic model for online document clustering. We use non-parametric...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Generative models based on the multivariate Bernoulli and multinomial distributions have been widely...
Recently, a probabilistic topic modelling approach, latent dirichlet allocation (LDA), has been exte...