It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experime...
We propose a graph-based representation of text collections where the nodes are textual units such a...
Traditional techniques of document clustering do not consider the semantic relationships between wor...
A language-independent method for automatic clustering of certain classes of documents is described....
Abstract. It is well known that connectivity analysis of linked documents provides significant infor...
Abstract: In this paper, a unified framework for clustering documents based on vocabulary overlap an...
There are many scenarios where we may want to find pairs of textually similar documents in a large c...
Clustering is an essential data mining task with numerous applications. Clustering is the process of...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
With the abundance of written information available online, it is useful to be able to automatically...
∗Signatures are on file in the Graduate School. Discovery of latent semantic groupings and identific...
The challenge of detecting research topics in a specific research field has attracted attention from...
A graph-based distance between Wikipedia ar-ticles is defined using a random walk model, which estim...
Abstract. Document clustering techniques mostly rely on single term analysis of text, such as the ve...
Document clustering techniques mostly rely on single term analysis of the document data set, such as...
Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in d...
We propose a graph-based representation of text collections where the nodes are textual units such a...
Traditional techniques of document clustering do not consider the semantic relationships between wor...
A language-independent method for automatic clustering of certain classes of documents is described....
Abstract. It is well known that connectivity analysis of linked documents provides significant infor...
Abstract: In this paper, a unified framework for clustering documents based on vocabulary overlap an...
There are many scenarios where we may want to find pairs of textually similar documents in a large c...
Clustering is an essential data mining task with numerous applications. Clustering is the process of...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
With the abundance of written information available online, it is useful to be able to automatically...
∗Signatures are on file in the Graduate School. Discovery of latent semantic groupings and identific...
The challenge of detecting research topics in a specific research field has attracted attention from...
A graph-based distance between Wikipedia ar-ticles is defined using a random walk model, which estim...
Abstract. Document clustering techniques mostly rely on single term analysis of text, such as the ve...
Document clustering techniques mostly rely on single term analysis of the document data set, such as...
Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in d...
We propose a graph-based representation of text collections where the nodes are textual units such a...
Traditional techniques of document clustering do not consider the semantic relationships between wor...
A language-independent method for automatic clustering of certain classes of documents is described....