We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest-neighbor or symmetric k-nearest-neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both ...
The problem of graph clustering is a central optimization problem with various applications in numer...
K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning app...
Clustering is indispensable for data analysis in many scientific disciplines. Detecting clusters fro...
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The q...
AbstractWe study clustering algorithms based on neighborhood graphs on a random sample of data point...
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The q...
Assume we are given a sample of points from some underlying distribution which contains several dist...
We present a procedure for the identification of clusters in multivariate data sets, based on the co...
Nearest neighbor ($k$-NN) graphs are widely used in machine learning and data mining applications, a...
The ''nearest-neighbor'' relation, or more generally the ''k-nearest-neighbors'' relation, defined f...
Data clustering is a fundamental machine learning problem. Community structure is common in social a...
The ''nearest neighbor'' relation, or more generally the ''k nearest neighbors'' relation, defined f...
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In mac...
Spectral clustering is a well-known graph-theoretic clustering algorithm. Although spectral clusteri...
Abstract: Clustering is a well known data mining technique which is used to group together data item...
The problem of graph clustering is a central optimization problem with various applications in numer...
K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning app...
Clustering is indispensable for data analysis in many scientific disciplines. Detecting clusters fro...
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The q...
AbstractWe study clustering algorithms based on neighborhood graphs on a random sample of data point...
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The q...
Assume we are given a sample of points from some underlying distribution which contains several dist...
We present a procedure for the identification of clusters in multivariate data sets, based on the co...
Nearest neighbor ($k$-NN) graphs are widely used in machine learning and data mining applications, a...
The ''nearest-neighbor'' relation, or more generally the ''k-nearest-neighbors'' relation, defined f...
Data clustering is a fundamental machine learning problem. Community structure is common in social a...
The ''nearest neighbor'' relation, or more generally the ''k nearest neighbors'' relation, defined f...
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In mac...
Spectral clustering is a well-known graph-theoretic clustering algorithm. Although spectral clusteri...
Abstract: Clustering is a well known data mining technique which is used to group together data item...
The problem of graph clustering is a central optimization problem with various applications in numer...
K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning app...
Clustering is indispensable for data analysis in many scientific disciplines. Detecting clusters fro...