Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.QC 20220922Part of...
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science,...
Unsupervised learning involves inferring the inherent structures or patterns from unlabeled data. Si...
Generative networks implicitly approximate complex densities from their sampling with impressive acc...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
The last few years have witnessed the rise of the big data era, which features the prevalence of dat...
Pervasive and networked computers have dramatically reduced the cost of collecting and distributing ...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
ML systems contend with an ever-growing processing load of physical world data. These systems are ...
This paper reviews the appropriateness for application to large data sets of standard machine learni...
This article considers "compressive learning," an approach to large-scale machine learning where dat...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Traditional machine learning has been largely concerned with developing techniques for small or mode...
Methods that analyze large-scale data and make predictions based on data are increasingly prevalent ...
University of Technology, Sydney. Faculty of Engineering and Information Technology.There has been a...
The rapid development of modern information technology has significantly facilitated the generation,...
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science,...
Unsupervised learning involves inferring the inherent structures or patterns from unlabeled data. Si...
Generative networks implicitly approximate complex densities from their sampling with impressive acc...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
The last few years have witnessed the rise of the big data era, which features the prevalence of dat...
Pervasive and networked computers have dramatically reduced the cost of collecting and distributing ...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
ML systems contend with an ever-growing processing load of physical world data. These systems are ...
This paper reviews the appropriateness for application to large data sets of standard machine learni...
This article considers "compressive learning," an approach to large-scale machine learning where dat...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Traditional machine learning has been largely concerned with developing techniques for small or mode...
Methods that analyze large-scale data and make predictions based on data are increasingly prevalent ...
University of Technology, Sydney. Faculty of Engineering and Information Technology.There has been a...
The rapid development of modern information technology has significantly facilitated the generation,...
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science,...
Unsupervised learning involves inferring the inherent structures or patterns from unlabeled data. Si...
Generative networks implicitly approximate complex densities from their sampling with impressive acc...