The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proof...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
Clustering is used in identifying groups of samples with similar properties, and it is one of the mo...
Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and inter...
Abstract. The goal of this paper is to discuss statistical aspects of clus-tering in a framework whe...
There are many algorithms to cluster sample data points based on nearness or a similar-ity measure. ...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
There is a widespread belief that clustering is inherently subjective. To quote A. K. Jain, "As a ta...
We address the problem of communicating do-main knowledge from a user to the designer of a clusterin...
We argue that when objects are characterized by many attributes, clustering them on the basis of a r...
Clustering is a common technique for statistical data analysis, which is used in many fields, includ...
Clustering is a central approach for unsupervised learning. After clustering is applied, the most fu...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
Clustering is the unsupervised classification of patterns (observations, data items, or feature vect...
Abstract. This talk is an attempt at structuring and systematising the develop-ment of clustering as...
We propose a novel method for clustering data which is grounded in information-theoretic prin-ciples...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
Clustering is used in identifying groups of samples with similar properties, and it is one of the mo...
Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and inter...
Abstract. The goal of this paper is to discuss statistical aspects of clus-tering in a framework whe...
There are many algorithms to cluster sample data points based on nearness or a similar-ity measure. ...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
There is a widespread belief that clustering is inherently subjective. To quote A. K. Jain, "As a ta...
We address the problem of communicating do-main knowledge from a user to the designer of a clusterin...
We argue that when objects are characterized by many attributes, clustering them on the basis of a r...
Clustering is a common technique for statistical data analysis, which is used in many fields, includ...
Clustering is a central approach for unsupervised learning. After clustering is applied, the most fu...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
Clustering is the unsupervised classification of patterns (observations, data items, or feature vect...
Abstract. This talk is an attempt at structuring and systematising the develop-ment of clustering as...
We propose a novel method for clustering data which is grounded in information-theoretic prin-ciples...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among a...
Clustering is used in identifying groups of samples with similar properties, and it is one of the mo...
Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and inter...