ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorith...
Topic modeling has been used widely to extract the structures (topics) in a collection (corpus) of d...
This work aims at discovering topics in a text corpus and classifying the most relevant terms for ea...
This paper assesses topic coherence and human topic ranking of uncovered latent topics from scientif...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Abstract Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Alloca...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Context: Latent Dirichlet Allocation (LDA) has been successfully used in the literature to extract t...
In today's digital world, customers give their opinions on a product that they have purchased online...
Natural Language Processing is a complex method of data mining the vast trove of documents created a...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
Large scale library digitization projects such as the Open Content Alliance are producing vast quant...
Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by ...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Topic modeling is a generalization of clustering that posits that observations (words in a document)...
Topic modeling has been used widely to extract the structures (topics) in a collection (corpus) of d...
This work aims at discovering topics in a text corpus and classifying the most relevant terms for ea...
This paper assesses topic coherence and human topic ranking of uncovered latent topics from scientif...
Thesis (Master's)--University of Washington, 2014In their 2001 work Latent Dirichlet Allocation, Ble...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
Abstract Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Alloca...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
Context: Latent Dirichlet Allocation (LDA) has been successfully used in the literature to extract t...
In today's digital world, customers give their opinions on a product that they have purchased online...
Natural Language Processing is a complex method of data mining the vast trove of documents created a...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
Large scale library digitization projects such as the Open Content Alliance are producing vast quant...
Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by ...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Topic modeling is a generalization of clustering that posits that observations (words in a document)...
Topic modeling has been used widely to extract the structures (topics) in a collection (corpus) of d...
This work aims at discovering topics in a text corpus and classifying the most relevant terms for ea...
This paper assesses topic coherence and human topic ranking of uncovered latent topics from scientif...