Abstract. It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpusbased document similarities. The...
Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in d...
Similarity of semantic content of web pages is displayed using interactive graphs presenting fragmen...
AbstractWe propose a method for computing semantic relatedness between words or texts by using knowl...
It is well known that connectivity analysis of linked documents provides significant information abo...
Abstract: In this paper, a unified framework for clustering documents based on vocabulary overlap an...
There are many scenarios where we may want to find pairs of textually similar documents in a large c...
Clustering is an essential data mining task with numerous applications. Clustering is the process of...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
∗Signatures are on file in the Graduate School. Discovery of latent semantic groupings and identific...
With the abundance of written information available online, it is useful to be able to automatically...
The challenge of detecting research topics in a specific research field has attracted attention from...
A graph-based distance between Wikipedia ar-ticles is defined using a random walk model, which estim...
Document clustering techniques mostly rely on single term analysis of the document data set, such as...
Abstract. Document clustering techniques mostly rely on single term analysis of text, such as the ve...
We propose a graph-based representation of text collections where the nodes are textual units such a...
Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in d...
Similarity of semantic content of web pages is displayed using interactive graphs presenting fragmen...
AbstractWe propose a method for computing semantic relatedness between words or texts by using knowl...
It is well known that connectivity analysis of linked documents provides significant information abo...
Abstract: In this paper, a unified framework for clustering documents based on vocabulary overlap an...
There are many scenarios where we may want to find pairs of textually similar documents in a large c...
Clustering is an essential data mining task with numerous applications. Clustering is the process of...
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive...
∗Signatures are on file in the Graduate School. Discovery of latent semantic groupings and identific...
With the abundance of written information available online, it is useful to be able to automatically...
The challenge of detecting research topics in a specific research field has attracted attention from...
A graph-based distance between Wikipedia ar-ticles is defined using a random walk model, which estim...
Document clustering techniques mostly rely on single term analysis of the document data set, such as...
Abstract. Document clustering techniques mostly rely on single term analysis of text, such as the ve...
We propose a graph-based representation of text collections where the nodes are textual units such a...
Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in d...
Similarity of semantic content of web pages is displayed using interactive graphs presenting fragmen...
AbstractWe propose a method for computing semantic relatedness between words or texts by using knowl...