Topic modeling is a useful tool in computational social science, digital humanities, biology, and chemistry. A popular topic model is the probabilistic Latent Semantic Indexing (pLSI) model. It assumes that the word-document matrix factorizes into the product of a low-rank word-topic matrix A, and a low-rank topic-document matrixW. The goal is to estimate these matrices. While many algorithms are available for topic modeling, there is relatively little statistical understanding. The first contribution of this thesis is providing rigorous statistical theory for both problems, including the optimal rate of convergence for estimating A, the optimal rate of convergence for estimating W, and an unconventional theory for including "sparsity" in ...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Topic modeling is an actively developing field in the statistical analysis of texts [1]. A probabili...
The probabilistic topic model imposes a low-rank structure on the expectation of the corpus matrix. ...
With the development of computer technology and the internet, increasingly large amounts of textual ...
With the development of computer technology and the internet, increasingly large amounts of textual ...
Topic modeling is a well-known approach for document anal-ysis. In this paper, we propose a new mode...
Topic modeling, or identifying the set of topics that occur in a collection of articles, is one of t...
Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, s...
56 pagesAcross many data domains, co-occurrence statistics about the joint appearance of objects are...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional ind...
Probabilistic topic models are widely used to discover latent topics in document col-lections, while...
Recently, there has been considerable progress on designing algorithms with provable guarantees - ty...
Probabilistic topic models are widely used to discover latent topics in document col-lections, while...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Topic modeling is an actively developing field in the statistical analysis of texts [1]. A probabili...
The probabilistic topic model imposes a low-rank structure on the expectation of the corpus matrix. ...
With the development of computer technology and the internet, increasingly large amounts of textual ...
With the development of computer technology and the internet, increasingly large amounts of textual ...
Topic modeling is a well-known approach for document anal-ysis. In this paper, we propose a new mode...
Topic modeling, or identifying the set of topics that occur in a collection of articles, is one of t...
Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, s...
56 pagesAcross many data domains, co-occurrence statistics about the joint appearance of objects are...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional ind...
Probabilistic topic models are widely used to discover latent topics in document col-lections, while...
Recently, there has been considerable progress on designing algorithms with provable guarantees - ty...
Probabilistic topic models are widely used to discover latent topics in document col-lections, while...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Topic modeling is an actively developing field in the statistical analysis of texts [1]. A probabili...
The probabilistic topic model imposes a low-rank structure on the expectation of the corpus matrix. ...