The advent of large-scale datasets has offered unprecedented amounts of information for building statistically powerful machines, but, at the same time, also introduced a remarkable computational challenge: how can we efficiently process massive data? This thesis presents a suite of data reduction methods that make learning algorithms scale on large datasets, via extracting a succinct model-specific representation that summarizes the full data collection—a coreset. Our frameworks support by design datasets of arbitrary dimensionality, and can be used for general purpose Bayesian inference under real-world constraints, including privacy preservation and robustness to outliers, encompassing diverse uncertainty-aware data analysis tasks, such ...
Recent years have witnessed increasing interest among researchers in protecting individual privacy i...
For an extended version of this article that contains additional references and more in-depth discus...
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on ...
Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-sca...
Modern machine learning increasingly involves personal data, such as healthcare, financial and user ...
Large-scale data processing prompts a number of important challenges, including guaranteeing that co...
International audienceThis work addresses the problem of learning from large collections of data wit...
Here we consider a common data encryption problem encountered by users who want to disclose some dat...
Behavioral data, collected from our daily interactions with technology, have driven scientific advan...
The increasing size and complexity of datasets have accelerated the development of machine learning ...
This work addresses the problem of learning from large collections of data with privacy guarantees. ...
Imagine a collection of individuals who each possess private data that they do not wish to share wit...
Sensitive information in user data, such as health status, financial history, and personal preferenc...
Abstract. The ubiquitous need for analyzing privacy-sensitive information— including health records,...
Recent years have witnessed increasing interest among researchers in protecting individual privacy i...
For an extended version of this article that contains additional references and more in-depth discus...
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on ...
Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-sca...
Modern machine learning increasingly involves personal data, such as healthcare, financial and user ...
Large-scale data processing prompts a number of important challenges, including guaranteeing that co...
International audienceThis work addresses the problem of learning from large collections of data wit...
Here we consider a common data encryption problem encountered by users who want to disclose some dat...
Behavioral data, collected from our daily interactions with technology, have driven scientific advan...
The increasing size and complexity of datasets have accelerated the development of machine learning ...
This work addresses the problem of learning from large collections of data with privacy guarantees. ...
Imagine a collection of individuals who each possess private data that they do not wish to share wit...
Sensitive information in user data, such as health status, financial history, and personal preferenc...
Abstract. The ubiquitous need for analyzing privacy-sensitive information— including health records,...
Recent years have witnessed increasing interest among researchers in protecting individual privacy i...
For an extended version of this article that contains additional references and more in-depth discus...
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on ...