International audienceIn this paper, we address the problem of high-dimensional k-means clustering in a large-scale setting, i.e. for datasets that comprise a large number of items. Sketching techniques have already been used to deal with this “large-scale” issue, by compressing the whole dataset into a single vector of random nonlinear generalized moments from which the k centroids are then retrieved efficiently. However , this approach usually scales quadratically with the dimension; to cope with high-dimensional datasets, we show how to use fast structured random matrices to compute the sketching operator efficiently. This yields significant speed-ups and memory savings for high-dimensional data, while the clustering results are shown to...
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department ...
Many applications require the clustering of large amounts of high-dimensional data. Most clustering ...
This paper presents an algorithm based on the Growing Self Organizing Map (GSOM) called the High Dim...
International audienceIn this paper, we address the problem of high-dimensional k-means clustering i...
Large-scale clustering has been widely used in many applications, and has received much attention. M...
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rig...
International audienceIn sketched clustering, a dataset of T samples is first sketched down to a vec...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
International audienceThe Lloyd-Max algorithm is a classical approach to perform K-means clustering....
Projection methods for dimension reduction have enabled the discovery of otherwise unattainable stru...
Clustering high-dimensional data is more difficult than clustering low-dimensional data. The problem...
More and more data are produced every day. Some clustering techniques have been developed to automat...
Subspace clustering is a popular method for clustering unlabelled data. However, the computational c...
The purpose of this thesis is to present our research works on some of the fundamental issues encoun...
The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks...
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department ...
Many applications require the clustering of large amounts of high-dimensional data. Most clustering ...
This paper presents an algorithm based on the Growing Self Organizing Map (GSOM) called the High Dim...
International audienceIn this paper, we address the problem of high-dimensional k-means clustering i...
Large-scale clustering has been widely used in many applications, and has received much attention. M...
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rig...
International audienceIn sketched clustering, a dataset of T samples is first sketched down to a vec...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
International audienceThe Lloyd-Max algorithm is a classical approach to perform K-means clustering....
Projection methods for dimension reduction have enabled the discovery of otherwise unattainable stru...
Clustering high-dimensional data is more difficult than clustering low-dimensional data. The problem...
More and more data are produced every day. Some clustering techniques have been developed to automat...
Subspace clustering is a popular method for clustering unlabelled data. However, the computational c...
The purpose of this thesis is to present our research works on some of the fundamental issues encoun...
The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks...
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department ...
Many applications require the clustering of large amounts of high-dimensional data. Most clustering ...
This paper presents an algorithm based on the Growing Self Organizing Map (GSOM) called the High Dim...