The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although coresets are used as an acceleration technique for many learning problems, the algorithms used for constructing them may become computationally exhaustive in some settings. We show that this can easily happen when computing coresets for learning a logistic regression classifier. We overcome this issue with two methods: Accelerating Clustering via Sampling (ACvS) and Regressed Data Summarisation Framework (RDSF); the former is an acceleration procedure based on a simple theoretical observation on using Uniform Random Sampling for clustering problems, the latter is a coreset-based data-summarising framework that builds on ACvS and extends it by using...
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive d...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
We present new algorithms for k-means clustering on a data stream with a focus on providing fast res...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
A coreset is a small set that can approximately preserve the structure of the original input data se...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
A wide range of optimization problems arising in machine learning can be solved by gradient descent ...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
This paper introduces the problem of coresets for regression problems to panel data settings. We firs...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Departmen...
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive d...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
We present new algorithms for k-means clustering on a data stream with a focus on providing fast res...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
A coreset is a small set that can approximately preserve the structure of the original input data se...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
A wide range of optimization problems arising in machine learning can be solved by gradient descent ...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
This paper introduces the problem of coresets for regression problems to panel data settings. We firs...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Departmen...
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive d...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
We present new algorithms for k-means clustering on a data stream with a focus on providing fast res...