This thesis focuses on solving the $K$-means clustering problem approximately with side information provided by crowdsourcing. Both binary same-cluster oracle and general crowdsourcing framework are considered. It can be shown that, under some mild assumptions on the smallest cluster size, one can obtain a $(1+\epsilon)$-approximation for the optimal potential with probability at least $1-\delta$, where $\epsilon>0$ and $\delta\in(0,1)$, using an expected number of $O(\frac{K^3}{\epsilon \delta})$ noiseless same-cluster queries and comparison-based clustering of complexity $O(ndK + \frac{K^3}{\epsilon \delta})$; here, $n$ denotes the number of points and $d$ the dimension of space. Compared to a handful of other known approaches that perfor...
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers...
Due to current data collection technology, our ability to gather data has surpassed our ability to a...
This thesis focuses on solving the $K$-means clustering problem approximately with side information ...
We consider the problem of clustering n items into K disjoint clusters using noisy answers from crow...
We study k-means clustering in a semi-supervised setting. Given an oracle that returns whether two g...
A wide range of applications in engineering as well as the natural and social sciences have datasets...
This thesis is divided into two parts. In part one, we study the k-median and the k-means clustering...
Advances in recent techniques for scientific data collection in the era of big data allow for the sy...
General purpose and highly applicable clustering methods are usually required during the early stage...
Recent developments in local search analysis have yielded the first polynomial-time approximation sc...
Clustering has been one of the most widely studied topics in data mining and it is often the first s...
Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is espec...
This dissertation largely studies problems of two types. In the first part, we study ranking and clu...
This work studies clustering algorithms which operates with ordinal or comparison-based queries (ope...
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers...
Due to current data collection technology, our ability to gather data has surpassed our ability to a...
This thesis focuses on solving the $K$-means clustering problem approximately with side information ...
We consider the problem of clustering n items into K disjoint clusters using noisy answers from crow...
We study k-means clustering in a semi-supervised setting. Given an oracle that returns whether two g...
A wide range of applications in engineering as well as the natural and social sciences have datasets...
This thesis is divided into two parts. In part one, we study the k-median and the k-means clustering...
Advances in recent techniques for scientific data collection in the era of big data allow for the sy...
General purpose and highly applicable clustering methods are usually required during the early stage...
Recent developments in local search analysis have yielded the first polynomial-time approximation sc...
Clustering has been one of the most widely studied topics in data mining and it is often the first s...
Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is espec...
This dissertation largely studies problems of two types. In the first part, we study ranking and clu...
This work studies clustering algorithms which operates with ordinal or comparison-based queries (ope...
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers...
Due to current data collection technology, our ability to gather data has surpassed our ability to a...