Kernel methods play a central role in machine learning and statistics, but algorithms for such methods scale poorly to large, high-dimensional datasets. Kernel sum computations are often the bottleneck, as they must aggregate all pairwise interactions between a query and each element of the dataset. Prior research has resulted in fast methods to approximate this sum with coresets, kernel approximations and adaptive sampling. However, existing methods still have prohibitively high memory and computation costs, especially for emerging applications in web-scale learning, genomics and streaming data. In my work, I have developed a compressed summary of the dataset, or sketch, that supports fast approximate sum queries for a special class of ker...
The main contribution of the thesis is the development of a fast library for approximating kernel ex...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
Kernel methods play a central role in machine learning and statistics, but algorithms for such metho...
Huge data sets containing millions of training examples with a large number of attributes are relati...
Traditional machine learning has been largely concerned with developing techniques for small or mode...
With the fast growth of large scale and high-dimensional datasets, large-scale machine learning and ...
The class of computational problems I consider in this thesis share the common trait of requiring ...
Abstract. We present a fast algorithm for kernel summation problems in high-dimensions. These proble...
Learning a computationally efficient kernel from data is an important machine learning problem. The ...
There is an increasing demand from businesses and industries to make the best use of their data. Clu...
Abstract. We consider fast kernel summations in high dimensions: given a large set of points in d di...
Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, e...
The availability of large and rich quantities of text data is due to the emergence of the World Wide...
Pervasive and networked computers have dramatically reduced the cost of collecting and distributing ...
The main contribution of the thesis is the development of a fast library for approximating kernel ex...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
Kernel methods play a central role in machine learning and statistics, but algorithms for such metho...
Huge data sets containing millions of training examples with a large number of attributes are relati...
Traditional machine learning has been largely concerned with developing techniques for small or mode...
With the fast growth of large scale and high-dimensional datasets, large-scale machine learning and ...
The class of computational problems I consider in this thesis share the common trait of requiring ...
Abstract. We present a fast algorithm for kernel summation problems in high-dimensions. These proble...
Learning a computationally efficient kernel from data is an important machine learning problem. The ...
There is an increasing demand from businesses and industries to make the best use of their data. Clu...
Abstract. We consider fast kernel summations in high dimensions: given a large set of points in d di...
Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, e...
The availability of large and rich quantities of text data is due to the emergence of the World Wide...
Pervasive and networked computers have dramatically reduced the cost of collecting and distributing ...
The main contribution of the thesis is the development of a fast library for approximating kernel ex...
This work is comprised of two different projects in numerical linear algebra. The first project is a...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...