In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks.We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing ...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In this thesis we seek to make advances towards the goal of effective learned compression. This enta...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
A coreset is a small set that can approximately preserve the structure of the original input data se...
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Departmen...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
A wide range of optimization problems arising in machine learning can be solved by gradient descent ...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
This paper introduces the problem of coresets for regression problems to panel data settings. We firs...
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive d...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In this thesis we seek to make advances towards the goal of effective learned compression. This enta...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
A coreset is a small set that can approximately preserve the structure of the original input data se...
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Departmen...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
A wide range of optimization problems arising in machine learning can be solved by gradient descent ...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
This paper introduces the problem of coresets for regression problems to panel data settings. We firs...
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive d...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In this thesis we seek to make advances towards the goal of effective learned compression. This enta...
We study the problem of constructing coresets for clustering problems with time series data. This pr...