Selecting features from documents that describe user information needs is challenging due to the nature of text, where redundancy, synonymy, polysemy, noise and high dimensionality are common problems. The assumption that clustered documents describe only one topic can be too simple knowing that most long documents discuss multiple topics. LDA-based models show significant improvement over the cluster-based in information retrieval (IR). However, the integration of both techniques for feature selection (FS) is still limited. In this paper, we propose an innovative and effective cluster- and LDA-based model for relevance FS. The model also integrates a new extended random set theory to generalise the LDA local weights for document terms. It ...
The machine learning & text mining area topic modeling has been extensively accepted etc. To generat...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Many mature term-based or pattern-based approaches have been used in the field of information filter...
It is challenging to discover relevant features from long documents that describe user information n...
Unsupervised topic models, such as Latent Dirichlet Allocation (LDA), are widely used as automated f...
Search algorithms incorporating some form of topic model have a long history in information retrieva...
This thesis presents innovative and effective feature selection models and frameworks to select and ...
Document clustering incorporates a number of data mining techniques, and to achieve good clustering ...
Feature selection methods have been successfully applied to text categorization but seldom applied t...
The integration of topic models into ad hoc retrieval has been studied by many researchers in the pa...
Abstract—Electronic documents on the Internet are always generated with many kinds of side informati...
Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to ...
Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to ...
Abstract-Text categorization is the task of automatically assigning unlabeled text documents to some...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
The machine learning & text mining area topic modeling has been extensively accepted etc. To generat...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Many mature term-based or pattern-based approaches have been used in the field of information filter...
It is challenging to discover relevant features from long documents that describe user information n...
Unsupervised topic models, such as Latent Dirichlet Allocation (LDA), are widely used as automated f...
Search algorithms incorporating some form of topic model have a long history in information retrieva...
This thesis presents innovative and effective feature selection models and frameworks to select and ...
Document clustering incorporates a number of data mining techniques, and to achieve good clustering ...
Feature selection methods have been successfully applied to text categorization but seldom applied t...
The integration of topic models into ad hoc retrieval has been studied by many researchers in the pa...
Abstract—Electronic documents on the Internet are always generated with many kinds of side informati...
Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to ...
Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to ...
Abstract-Text categorization is the task of automatically assigning unlabeled text documents to some...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
The machine learning & text mining area topic modeling has been extensively accepted etc. To generat...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Many mature term-based or pattern-based approaches have been used in the field of information filter...