Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent ``topics\u27\u27 that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model is simultaneously better able to represent observ...
This work concentrates on mining textual data. In particular, I apply Statistical Machine Learning t...
Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Topic modeling is an unsupervised learning task that discovers the hidden topics in a ...
Scalable and effective analysis of large text corpora remains a chal-lenging problem as our ability ...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Topic modeling is a type of statistical model for discovering the latent "topics" that occur in a co...
Learning meaningful topic models with massive document collections which contain millions of documen...
textDigital media collections hold an unprecedented source of knowledge and data about the world. Y...
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical ana...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
Topic models provide a useful tool to organize and understand the structure of large corpora of text...
The main aim of this article is to present the results of different experiments focused on the probl...
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in ...
This work concentrates on mining textual data. In particular, I apply Statistical Machine Learning t...
Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Topic modeling is an unsupervised learning task that discovers the hidden topics in a ...
Scalable and effective analysis of large text corpora remains a chal-lenging problem as our ability ...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Topic modeling is a type of statistical model for discovering the latent "topics" that occur in a co...
Learning meaningful topic models with massive document collections which contain millions of documen...
textDigital media collections hold an unprecedented source of knowledge and data about the world. Y...
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical ana...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
Topic models provide a useful tool to organize and understand the structure of large corpora of text...
The main aim of this article is to present the results of different experiments focused on the probl...
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in ...
This work concentrates on mining textual data. In particular, I apply Statistical Machine Learning t...
Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...