While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inher-ent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different ap-proach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induce...
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an un...
The goal of topic detection or topic modelling is to uncover the hidden topics in a large corpus. It...
Addressing the problem of information overload, automatic multi-document summarization (MDS) has bee...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases f...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Phrase snippets of large text corpora like news articles or web search results offer great insight a...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
Research Session 41: Data Mining, Copy Detection and Data PublicationLarge text corpora with news, c...
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic mo...
Abstract. This paper presents a topic model that captures the tem-poral dynamics in the text data al...
There are many popular models available for classification of documents like Naïve Bayes Classifier...
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption...
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assump-tio...
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an un...
The goal of topic detection or topic modelling is to uncover the hidden topics in a large corpus. It...
Addressing the problem of information overload, automatic multi-document summarization (MDS) has bee...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases f...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Phrase snippets of large text corpora like news articles or web search results offer great insight a...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
Research Session 41: Data Mining, Copy Detection and Data PublicationLarge text corpora with news, c...
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic mo...
Abstract. This paper presents a topic model that captures the tem-poral dynamics in the text data al...
There are many popular models available for classification of documents like Naïve Bayes Classifier...
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption...
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assump-tio...
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an un...
The goal of topic detection or topic modelling is to uncover the hidden topics in a large corpus. It...
Addressing the problem of information overload, automatic multi-document summarization (MDS) has bee...