One of the major challenges of mining topics from a large corpus is the quality of the constructed topics. While phrase-generating approaches generally produce high quality output, they do not scale very well with the size of the data. Thus, the state of the art solutions usually rely upon scalable unigram-generating methods, which do not produce high quality human-readable topics, or are forced to use external knowledge bases. Furthermore, while document collections naturally contain topics at different levels of granularity (general vs. specific), very few traditional methods focus on generating high quality hierarchical topic structures. This dissertation presents a series of approaches that directly addresses these challenges of gen...
Uncovering the topics over short text corpus has become increasingly important with the bursty devel...
In today's information society, we are soaked with overwhelming amounts of natural-language text dat...
There is an increasing interest in automating creation of semantic structures, especially topic maps...
While most topic modeling algorithms model text corpora with unigrams, human interpretation often re...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
This thesis proposes a novel model for automatically generate topic map for a document corpus with n...
A lot of digital ink has been spilled on "big data" over the past few years, which is often characte...
1. Introduction to bringing structure to text 2. Mining phrase-based and entity-enriched topical hie...
The “big data” era is characterized by an explosion of information in the form of digital data colle...
A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases f...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Topics extraction from documents has become increasingly important due to its effectiveness in many ...
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an un...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
Uncovering the topics over short text corpus has become increasingly important with the bursty devel...
In today's information society, we are soaked with overwhelming amounts of natural-language text dat...
There is an increasing interest in automating creation of semantic structures, especially topic maps...
While most topic modeling algorithms model text corpora with unigrams, human interpretation often re...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
This thesis proposes a novel model for automatically generate topic map for a document corpus with n...
A lot of digital ink has been spilled on "big data" over the past few years, which is often characte...
1. Introduction to bringing structure to text 2. Mining phrase-based and entity-enriched topical hie...
The “big data” era is characterized by an explosion of information in the form of digital data colle...
A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases f...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Topics extraction from documents has become increasingly important due to its effectiveness in many ...
Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an un...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
Uncovering the topics over short text corpus has become increasingly important with the bursty devel...
In today's information society, we are soaked with overwhelming amounts of natural-language text dat...
There is an increasing interest in automating creation of semantic structures, especially topic maps...