Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that pur...
This study describes a method for constructing a causality model from text data, such as review data...
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP ...
Topic modeling is a generalization of clustering that posits that observations (words in a document)...
Topic modeling is a popular technique for clustering large collections of text documents. A variety ...
Supervised hierarchical topic modeling and unsupervised hierarchical topic modeling are usually used...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
We study the problem of topic modeling in corpora whose documents are organized in a multi-level hie...
With the vast amount of information available on the Internet today, helping users find relevant con...
Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) ar...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) ar...
The four-level pachinko allocation model (PAM) (Li & McCallum, 2006) represents correlations among t...
Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of ...
Automated generation of high-quality topical hierarchies for a text collection is a dream problem in...
Abstract. Most nonparametric topic models such as Hierarchical Dirichlet Pro-cesses, when viewed as ...
This study describes a method for constructing a causality model from text data, such as review data...
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP ...
Topic modeling is a generalization of clustering that posits that observations (words in a document)...
Topic modeling is a popular technique for clustering large collections of text documents. A variety ...
Supervised hierarchical topic modeling and unsupervised hierarchical topic modeling are usually used...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
We study the problem of topic modeling in corpora whose documents are organized in a multi-level hie...
With the vast amount of information available on the Internet today, helping users find relevant con...
Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) ar...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) ar...
The four-level pachinko allocation model (PAM) (Li & McCallum, 2006) represents correlations among t...
Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of ...
Automated generation of high-quality topical hierarchies for a text collection is a dream problem in...
Abstract. Most nonparametric topic models such as Hierarchical Dirichlet Pro-cesses, when viewed as ...
This study describes a method for constructing a causality model from text data, such as review data...
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP ...
Topic modeling is a generalization of clustering that posits that observations (words in a document)...