We present efficient algorithms for all-point-pairs problems, or 'N-body '-like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N 2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric t echniques which are applicable in principle to any 'N-body ' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM 's. Our algo...
Nearest neighbor (NN) classifiers rely on a distance metric either a priori fixed or previously esti...
Abstract. Many fundamental statistical methods have become critical tools for scientific data analys...
In many areas of machine learning, the characterization of the input data is given by a form of prox...
Several key computational bottlenecks in machine learning involve pairwise dis-tance computations, i...
The Fast Multipole Method of Greengard and Rokhlin does the seemingly impossible: it approximates th...
We describe a recursive algorithm to quickly compute the N nearest neighbors according to a similari...
This paper addresses the issue of numerical computation in machine learning domains where one needs ...
Abstract: "We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics ove...
In the first part of this thesis, we examine the computational complexity of three fundamental stati...
In this dissertation, we explore parallel algorithms for general N-Body problems in high dimensions,...
The notion of similarities between data points is central to many classification and clustering algo...
In the wake of the Big Data phenomenon, the computing world has seen a number of computational parad...
. We propose a classification scheme that generates a set of balls separating points belonging to di...
The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euc...
We discuss a strategy for polychotomous classification that involves estimating class probabilities ...
Nearest neighbor (NN) classifiers rely on a distance metric either a priori fixed or previously esti...
Abstract. Many fundamental statistical methods have become critical tools for scientific data analys...
In many areas of machine learning, the characterization of the input data is given by a form of prox...
Several key computational bottlenecks in machine learning involve pairwise dis-tance computations, i...
The Fast Multipole Method of Greengard and Rokhlin does the seemingly impossible: it approximates th...
We describe a recursive algorithm to quickly compute the N nearest neighbors according to a similari...
This paper addresses the issue of numerical computation in machine learning domains where one needs ...
Abstract: "We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics ove...
In the first part of this thesis, we examine the computational complexity of three fundamental stati...
In this dissertation, we explore parallel algorithms for general N-Body problems in high dimensions,...
The notion of similarities between data points is central to many classification and clustering algo...
In the wake of the Big Data phenomenon, the computing world has seen a number of computational parad...
. We propose a classification scheme that generates a set of balls separating points belonging to di...
The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euc...
We discuss a strategy for polychotomous classification that involves estimating class probabilities ...
Nearest neighbor (NN) classifiers rely on a distance metric either a priori fixed or previously esti...
Abstract. Many fundamental statistical methods have become critical tools for scientific data analys...
In many areas of machine learning, the characterization of the input data is given by a form of prox...