In the time of Big Data, training complex models on large-scale data sets is challenging, making it appealing to reduce data volume for saving computation resources by subsampling. Most previous works in subsampling are weighted methods designed to help the performance of subset-model approach the full-set-model, hence the weighted methods have no chance to acquire a subset-model that is better than the full-set-model. However, we question that how can we achieve better model with less data? In this work, we propose a novel Unweighted Influence Data Subsampling (UIDS) method, and prove that the subset-model acquired through our method can outperform the full-set-model. Besides, we show that overly confident on a given test set for sampling ...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
[[abstract]]It is difficult for learning models to achieve high classification performances with imb...
We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is...
Learning Compact High-Dimensional Models in Noisy Environments Building compact, interpretable sta...
42 pagesThis project studies methods of using data subsampling to perform model selection. Most comm...
In modern statistical applications, we are often faced with situationswhere there is either too litt...
recently, a classmate working in an insurance company told me he had too large datasets to run simpl...
A dataset is said to be imbalanced when its classes are disproportionately represented in terms of t...
© Springer Nature Singapore Pte Ltd. 2018. Imbalanced classification problem is an enthusiastic topi...
Ensemble methods using the same underlying algorithm trained on different subsets of observations ha...
There is an unprecedented amount of data available. This has caused knowledge discovery to garner at...
Ensemble methods using the same underlying algorithm trained on different subsets of observations ha...
To tackle massive data, subsampling is a practical approach to select the more informative data poin...
As technology evolves, big data bring us great opportunities to identify patterns which were infeasi...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
[[abstract]]It is difficult for learning models to achieve high classification performances with imb...
We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is...
Learning Compact High-Dimensional Models in Noisy Environments Building compact, interpretable sta...
42 pagesThis project studies methods of using data subsampling to perform model selection. Most comm...
In modern statistical applications, we are often faced with situationswhere there is either too litt...
recently, a classmate working in an insurance company told me he had too large datasets to run simpl...
A dataset is said to be imbalanced when its classes are disproportionately represented in terms of t...
© Springer Nature Singapore Pte Ltd. 2018. Imbalanced classification problem is an enthusiastic topi...
Ensemble methods using the same underlying algorithm trained on different subsets of observations ha...
There is an unprecedented amount of data available. This has caused knowledge discovery to garner at...
Ensemble methods using the same underlying algorithm trained on different subsets of observations ha...
To tackle massive data, subsampling is a practical approach to select the more informative data poin...
As technology evolves, big data bring us great opportunities to identify patterns which were infeasi...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data...
[[abstract]]It is difficult for learning models to achieve high classification performances with imb...
We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is...