Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information for classification. While these works are supported by theory and limited experiments, to date there has not been a comprehensive evaluation of these methods. In our work, we directly compare multiple methods for logistic regression drawn from the coreset and optimal subsampling literature and discover inconsistencies in their effectiveness. In many cases, methods do not outperform simple uniform subsampling
Various regularization techniques are investigated in supervised learning from data. Theoretical fea...
Logistic regression is a widely used statistical method in data analysis and machine learning. When ...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
The bootstrap is a widely used procedure for statistical inference because of its simplicity and att...
The bootstrap is a widely used procedure for statistical inference because of its simplicity and att...
We study Nystr\uf6m type subsampling approaches to large scale kernel methods, and prove learning bo...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
42 pagesThis project studies methods of using data subsampling to perform model selection. Most comm...
This paper presents a data pre-processing algorithm to tackle class imbalance in classification prob...
recently, a classmate working in an insurance company told me he had too large datasets to run simpl...
We compute the breakdown point of the subsampling quantile of a general statistic, and show that it ...
Various regularization techniques are investigated in supervised learning from data. Theoretical fea...
Logistic regression is a widely used statistical method in data analysis and machine learning. When ...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...
Coresets are one of the central methods to facilitate the analysis of large data. We continue a rece...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
The bootstrap is a widely used procedure for statistical inference because of its simplicity and att...
The bootstrap is a widely used procedure for statistical inference because of its simplicity and att...
We study Nystr\uf6m type subsampling approaches to large scale kernel methods, and prove learning bo...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
42 pagesThis project studies methods of using data subsampling to perform model selection. Most comm...
This paper presents a data pre-processing algorithm to tackle class imbalance in classification prob...
recently, a classmate working in an insurance company told me he had too large datasets to run simpl...
We compute the breakdown point of the subsampling quantile of a general statistic, and show that it ...
Various regularization techniques are investigated in supervised learning from data. Theoretical fea...
Logistic regression is a widely used statistical method in data analysis and machine learning. When ...
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining ...