A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases for individual topics is an effective way to explore and understand unstructured text corpora. Unfortunately, existing approaches predominately rely on the general distributional features between topics and phrases on an entire corpus, while ignore the impact of domain-level topical distribution. This often leads to losing domain-specific terminologies, and as a consequence, weakens the coherence of topical phrases. In this paper, we present a novel framework CITPM for topical phrase mining. Our framework views a corpus as a mixture of clusters (domains), and each cluster is characterized by documents sharing similar topical distributions. The...
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic mo...
We consider a challenging clustering task: the clustering of muti-word terms without document co-occ...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
While most topic modeling algorithms model text corpora with unigrams, human interpretation often re...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Phrase snippets of large text corpora like news articles or web search results offer great insight a...
Making sense of words often requires to simultaneously examine the surrounding context of a term as ...
We propose a method for supporting query refinement using topical term clusters. First, we propose a...
International audienceThis paper examines further a research hypothesis that syntactic variations ar...
International audienceWe present a system for mapping the structure of research topics in a corpus. ...
Keyword searching is the most common form of document search on the Web. Many Web publishers manuall...
Research Session 41: Data Mining, Copy Detection and Data PublicationLarge text corpora with news, c...
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic mo...
We consider a challenging clustering task: the clustering of muti-word terms without document co-occ...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...
A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phras...
While most topic modeling algorithms model text corpora with unigrams, human interpretation often re...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
A sentence is an integral unit of semantic nature, context and significance. Visualizing sentences f...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
Phrase snippets of large text corpora like news articles or web search results offer great insight a...
Making sense of words often requires to simultaneously examine the surrounding context of a term as ...
We propose a method for supporting query refinement using topical term clusters. First, we propose a...
International audienceThis paper examines further a research hypothesis that syntactic variations ar...
International audienceWe present a system for mapping the structure of research topics in a corpus. ...
Keyword searching is the most common form of document search on the Web. Many Web publishers manuall...
Research Session 41: Data Mining, Copy Detection and Data PublicationLarge text corpora with news, c...
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic mo...
We consider a challenging clustering task: the clustering of muti-word terms without document co-occ...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...