2000 Mathematics Subject Classification: 62H30This paper describes a statistics-based methodology for document unsupervised clustering and cluster topics extraction. For this purpose, multiword lexical units (MWUs) of any length are automatically extracted from corpora using the LiPXtractor extractor - a language independent statistics-based tool. The MWUs are taken as base-features to describe documents. These features are transformed and a document similarity matrix is constructed. From this matrix, a reduced set of features is selected using an approach based on Principal Component Analysis. Then, using the Model Based Clustering Analysis software, it is possible to obtain the best number of clusters. Precision and Recall for document-cl...
In this paper a novel method is proposed for scientific document clustering. The proposed method...
International audienceA major challenge in document clustering research arises from the growing amou...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
International audienceWe consider a challenging clustering task: the clustering of multi-word terms ...
We consider a challenging clustering task: the clustering of muti-word terms without document co-occ...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceWe applied different clustering algorithms to the task of clus- tering multi-w...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
The article addresses the problem of document clusterization. The author describes a technology for ...
Document clustering incorporates a number of data mining techniques, and to achieve good clustering ...
In a world flooded with information, document clustering is an important tool that can help categori...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
Since the amount of text data stored in computer repositories is growing every day, we need more tha...
Document clustering is primarily a method applied for an uncomplicated, document search, analysis an...
In this paper a novel method is proposed for scientific document clustering. The proposed method...
International audienceA major challenge in document clustering research arises from the growing amou...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
International audienceWe consider a challenging clustering task: the clustering of multi-word terms ...
We consider a challenging clustering task: the clustering of muti-word terms without document co-occ...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceWe applied different clustering algorithms to the task of clus- tering multi-w...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
The article addresses the problem of document clusterization. The author describes a technology for ...
Document clustering incorporates a number of data mining techniques, and to achieve good clustering ...
In a world flooded with information, document clustering is an important tool that can help categori...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
Since the amount of text data stored in computer repositories is growing every day, we need more tha...
Document clustering is primarily a method applied for an uncomplicated, document search, analysis an...
In this paper a novel method is proposed for scientific document clustering. The proposed method...
International audienceA major challenge in document clustering research arises from the growing amou...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...