Statistical methods for high-dimensional data analysis

Gupta, Abhishek

Publication date

January 2008

Publisher

ScholarlyCommons

Abstract

High-dimensional data are becoming increasingly pervasive, and bring new problems and opportunities for data analysis. This thesis develops methods for both supervised and unsupervised learning of high-dimensional data. The first topic we focus on is unsupervised metric learning in the context of clustering. We propose the criterion blur ratio, minimizing which yields a transformation (distance metric) that gives well separated and predictable clusters. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM), which alternately predicts cluster memberships and clusters these predictions. With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Statistical methods for high-dimensional data analysis

Abstract

Extracted data

Statistical methods for high-dimensional data analysis

Abstract

Extracted data

Related items

Related items