AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, ...
This is the final version. Available on open access from SAGE Publications via the DOI in this recor...
Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessibl...
Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end...
Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that e...
Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing...
Synthetic data generation is a powerful tool for privacy protection when considering public release ...
With the recent advances and increasing activities in data mining and analysis, the protection of th...
How can we share sensitive datasets in such a way as to maximize utility while simultaneously safegu...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
Clinical data analysis could lead to breakthroughs. However, clinical data contain sensitive informa...
Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal ...
These talks were presented for the Privacy Day Webinar 2022 sponsored by the American Statistical As...
With ever increasing capacity for collecting, storing, and processing of data, there is also a high ...
This explainer document aims to provide an overview of the current state of the rapidly expanding wo...
We consider the problem of enhancing user privacy in common data analysis and machine learning devel...
This is the final version. Available on open access from SAGE Publications via the DOI in this recor...
Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessibl...
Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end...
Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that e...
Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing...
Synthetic data generation is a powerful tool for privacy protection when considering public release ...
With the recent advances and increasing activities in data mining and analysis, the protection of th...
How can we share sensitive datasets in such a way as to maximize utility while simultaneously safegu...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
Clinical data analysis could lead to breakthroughs. However, clinical data contain sensitive informa...
Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal ...
These talks were presented for the Privacy Day Webinar 2022 sponsored by the American Statistical As...
With ever increasing capacity for collecting, storing, and processing of data, there is also a high ...
This explainer document aims to provide an overview of the current state of the rapidly expanding wo...
We consider the problem of enhancing user privacy in common data analysis and machine learning devel...
This is the final version. Available on open access from SAGE Publications via the DOI in this recor...
Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessibl...
Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end...