We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics. Unlike previous approaches, CWIBTD uses co-occurrence word networks to model the topic distribution of each word, which improves the semantic density of the data space and ensures its sensitivity in identify-ing rare topics by improving the way node activity is calculated and normal-izing scarce topics and large topics to some extent. In addition, using the sam...
As large-scale digital text collections become abundant, the necessity of automatically summarizing ...
In recent years, with the rapid growth of social media, short texts have been very prevalent on the ...
Topic modeling is a suite of algorithms, which aims to discover the hidden structures in large digit...
In our daily life, short texts have been everywhere especially since the emergence of social network...
Short texts are a common source of knowledge, and the extraction of such valuable information is ben...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
Streams of short text, such as news titles, enable us to effectively and efficiently learn the real ...
Probabilistic topic models are machine learning tools for processing and understanding large text d...
With the rapid proliferation of social networking sites (SNS), automatic topic extraction from vario...
Recently, there has been an exponential rise in the use of online social media systems like Twitter ...
Topic modeling has been proved to be an effective method for exploratory text mining. It is a common...
International audienceThe most popular topic modelling algorithm, Latent Dirichlet Allocation, produ...
Recent technological advances have radically changed the way we communicate. Today’s communication h...
This is the author accepted manuscript. The final version is available from Springer Verlag via the ...
We present our novel, hyperparameter-free topic modelling algorithm, Community Topic. Our algorithm ...
As large-scale digital text collections become abundant, the necessity of automatically summarizing ...
In recent years, with the rapid growth of social media, short texts have been very prevalent on the ...
Topic modeling is a suite of algorithms, which aims to discover the hidden structures in large digit...
In our daily life, short texts have been everywhere especially since the emergence of social network...
Short texts are a common source of knowledge, and the extraction of such valuable information is ben...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
Streams of short text, such as news titles, enable us to effectively and efficiently learn the real ...
Probabilistic topic models are machine learning tools for processing and understanding large text d...
With the rapid proliferation of social networking sites (SNS), automatic topic extraction from vario...
Recently, there has been an exponential rise in the use of online social media systems like Twitter ...
Topic modeling has been proved to be an effective method for exploratory text mining. It is a common...
International audienceThe most popular topic modelling algorithm, Latent Dirichlet Allocation, produ...
Recent technological advances have radically changed the way we communicate. Today’s communication h...
This is the author accepted manuscript. The final version is available from Springer Verlag via the ...
We present our novel, hyperparameter-free topic modelling algorithm, Community Topic. Our algorithm ...
As large-scale digital text collections become abundant, the necessity of automatically summarizing ...
In recent years, with the rapid growth of social media, short texts have been very prevalent on the ...
Topic modeling is a suite of algorithms, which aims to discover the hidden structures in large digit...