A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset...
Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pagesInternational a...
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i....
We consider the projected gradient algorithm for the nonconvex best subset selection problem that mi...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Coreset selection is powerful in reducing computational costs and accelerating data processing for d...
A major problem for kernel-based predictors is the prohibitive computational complexity, which limit...
The dissertation addresses the research topics of machine learning outlined below. We developed the ...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as c...
A coreset is a small set that can approximately preserve the structure of the original input data se...
We construct near-optimal coresets for kernel density estimate for points in R^d when the kernel is ...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pagesInternational a...
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i....
We consider the projected gradient algorithm for the nonconvex best subset selection problem that mi...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Coreset selection is powerful in reducing computational costs and accelerating data processing for d...
A major problem for kernel-based predictors is the prohibitive computational complexity, which limit...
The dissertation addresses the research topics of machine learning outlined below. We developed the ...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as c...
A coreset is a small set that can approximately preserve the structure of the original input data se...
We construct near-optimal coresets for kernel density estimate for points in R^d when the kernel is ...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pagesInternational a...
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i....
We consider the projected gradient algorithm for the nonconvex best subset selection problem that mi...