This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation a...
K-means algorithm is one of the most widely used methods in data mining and statistical data analysi...
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogen...
International audienceSummary k-Means is a standard algorithm for clustering data. It constitutes ge...
This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight super...
Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adapti...
Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R01052...
Clustering is a popular technique that can help make large datasets more manageable and usable by gr...
In this paper, a configurable many-core hardware/ software architecture is proposed to efficiently ...
Handling and processing of larger volume of data requires efficient data mining algorithms. k-means ...
G-means is a data mining clustering algorithm based on k-means, used to find the number of Gaussian ...
Clustering approaches are widely used methodologies to analyse large data sets. The K-means algorith...
Abstract. To cluster increasingly massive data sets that are common today in data and text mining, w...
The current state and foreseeable future of high performance scientific computing (HPC) can be descr...
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determin...
K-means algorithm is one of the most widely used methods in data mining and statistical data analysi...
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogen...
International audienceSummary k-Means is a standard algorithm for clustering data. It constitutes ge...
This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight super...
Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adapti...
Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R01052...
Clustering is a popular technique that can help make large datasets more manageable and usable by gr...
In this paper, a configurable many-core hardware/ software architecture is proposed to efficiently ...
Handling and processing of larger volume of data requires efficient data mining algorithms. k-means ...
G-means is a data mining clustering algorithm based on k-means, used to find the number of Gaussian ...
Clustering approaches are widely used methodologies to analyse large data sets. The K-means algorith...
Abstract. To cluster increasingly massive data sets that are common today in data and text mining, w...
The current state and foreseeable future of high performance scientific computing (HPC) can be descr...
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determin...
K-means algorithm is one of the most widely used methods in data mining and statistical data analysi...
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogen...
International audienceSummary k-Means is a standard algorithm for clustering data. It constitutes ge...