Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multi-topic document clustering by leveraging the natural composition of documents in text segments, which bear one or more topics on their own. We propose a segment-based document clustering framework, which is designed to induce a classification of documents starting from the identification of cohesive groups of segment-based portions of the original documents. We empirically give evidence of the significance of our approach on different, large collections of multi-topic documents
Increasing progress in numerous research fields and information technologies, led to an increase in ...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Document clustering has been recognized as a central problem in text data management, and it becomes...
Abstract — The objective of clustering is to partition an unstructured set of objects into clusters ...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
In a world flooded with information, document clustering is an important tool that can help categori...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
The article addresses the problem of document clusterization. The author describes a technology for ...
<p>Document clustering and topic modeling are two closely related tasks which can mutually benefit e...
The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in...
Clustering textual contents is an important step in mining useful information on the web or other te...
Date du colloque : 09/2008International audienceAn alternative way to tackle Information Retrie...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
Increasing progress in numerous research fields and information technologies, led to an increase in ...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Document clustering has been recognized as a central problem in text data management, and it becomes...
Abstract — The objective of clustering is to partition an unstructured set of objects into clusters ...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
In a world flooded with information, document clustering is an important tool that can help categori...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
The article addresses the problem of document clusterization. The author describes a technology for ...
<p>Document clustering and topic modeling are two closely related tasks which can mutually benefit e...
The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in...
Clustering textual contents is an important step in mining useful information on the web or other te...
Date du colloque : 09/2008International audienceAn alternative way to tackle Information Retrie...
Document clustering, which is also refered to as text clustering, is a technique of unsupervised doc...
Increasing progress in numerous research fields and information technologies, led to an increase in ...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...