An automated method for clustering terms/concepts from a set of documents on the same topic was developed for the purpose of multidocument summarization. The clustering method makes use of a combination of lexical overlap between multiword terms, syntactic constraints and semantic consideration based on a manually constructed taxonomy to generate hierarchically organized clusters of terms. This study evaluates the machine-generated clusters by calculating the proportion of overlap with two sets of human-generated clusters for 15 topics. It was found that the overlap between machine-generated clusters and individual human-generated clusters are higher than that between two human-generated clusters. A qualitative analysis of the human cluster...
Manual document categorization is time consuming, expensive, and difficult to manage for large colle...
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document ...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
Vast amounts of text documents are available in various fields. The accumulations of available text ...
Humans are used to expressing themselves with written language and language provides a medium with w...
Thematic organization of text is a natural practice of humans and a crucial task for today's vast re...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
Abstract: Clustering is a technique of collecting data into subsets in such a manner that identical ...
Abstract: Most of the common techniques of text mining are based on the statistical analysis of the ...
The article addresses the problem of document clusterization. The author describes a technology for ...
This paper presents work in progress on clustering methods that identify semantic concepts in a docu...
International audienceWe consider a challenging clustering task: the clustering of multi-word terms ...
The constant success of the Internet made the number of text documents in electronic forms increases...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
Manual document categorization is time consuming, expensive, and difficult to manage for large colle...
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document ...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
Vast amounts of text documents are available in various fields. The accumulations of available text ...
Humans are used to expressing themselves with written language and language provides a medium with w...
Thematic organization of text is a natural practice of humans and a crucial task for today's vast re...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
Abstract: Clustering is a technique of collecting data into subsets in such a manner that identical ...
Abstract: Most of the common techniques of text mining are based on the statistical analysis of the ...
The article addresses the problem of document clusterization. The author describes a technology for ...
This paper presents work in progress on clustering methods that identify semantic concepts in a docu...
International audienceWe consider a challenging clustering task: the clustering of multi-word terms ...
The constant success of the Internet made the number of text documents in electronic forms increases...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
Manual document categorization is time consuming, expensive, and difficult to manage for large colle...
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document ...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...