High-dimensional data are becoming increasingly pervasive, and bring new problems and opportunities for data analysis. This thesis develops methods for both supervised and unsupervised learning of high-dimensional data. The first topic we focus on is unsupervised metric learning in the context of clustering. We propose the criterion blur ratio, minimizing which yields a transformation (distance metric) that gives well separated and predictable clusters. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM), which alternately predicts cluster memberships and clusters these predictions. With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
Feature selection is an important research area that seeks to eliminate unwanted features from datas...
Given enormous amount of data produced each day it would be immensely useful if we could use it to l...
Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. T...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Fast and eective unsupervised clustering is a fundamental tool in unsupervised learning. Here is a n...
Clustering is an important ingredient of unsupervised learning; classical clustering methods include...
Clustering high-dimensional data often requires some form of dimensionality reduction, where cluster...
Unsupervised and semi-supervised learning are explored in convex clustering with metric learning whi...
We present a nonparametric method for selecting informative features in high-dimensional clustering ...
Fast accumulation of large amounts of complex data has created a needfor more sophisticated statisti...
Clustering is a central topic in unsupervised learning and has a wide variety of applications. Howev...
In this thesis, we present new developments of hierarchical clustering in high-dimensional data. We ...
Learning a statistical model for high-dimensional data is an important topic in machine learning. Al...
The purpose of this thesis is to present our research works on some of the fundamental issues encoun...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
Feature selection is an important research area that seeks to eliminate unwanted features from datas...
Given enormous amount of data produced each day it would be immensely useful if we could use it to l...
Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. T...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Fast and eective unsupervised clustering is a fundamental tool in unsupervised learning. Here is a n...
Clustering is an important ingredient of unsupervised learning; classical clustering methods include...
Clustering high-dimensional data often requires some form of dimensionality reduction, where cluster...
Unsupervised and semi-supervised learning are explored in convex clustering with metric learning whi...
We present a nonparametric method for selecting informative features in high-dimensional clustering ...
Fast accumulation of large amounts of complex data has created a needfor more sophisticated statisti...
Clustering is a central topic in unsupervised learning and has a wide variety of applications. Howev...
In this thesis, we present new developments of hierarchical clustering in high-dimensional data. We ...
Learning a statistical model for high-dimensional data is an important topic in machine learning. Al...
The purpose of this thesis is to present our research works on some of the fundamental issues encoun...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
Feature selection is an important research area that seeks to eliminate unwanted features from datas...
Given enormous amount of data produced each day it would be immensely useful if we could use it to l...