Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we ...
Advances in computation have created high demand for large datasets, which in turn has sparked inter...
How can we share sensitive datasets in such a way as to maximize utility while simultaneously safegu...
Synthetic data generation is a powerful tool for privacy protection when considering public release ...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
Despite several works that succeed in generating synthetic data with differential privacy (DP) guara...
Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for...
We present three new algorithms for constructing differentially private synthetic data—a sanitized v...
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll informa...
AI-based data synthesis has seen rapid progress over the last several years and is increasingly reco...
Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that e...
Differentially private data generation techniques have become a promising solution to the data priva...
Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal ...
With the recent advances and increasing activities in data mining and analysis, the protection of th...
We consider the problem of enhancing user privacy in common data analysis and machine learning devel...
Abstract—In order to comply with data confidentiality requirements, while meeting usability needs fo...
Advances in computation have created high demand for large datasets, which in turn has sparked inter...
How can we share sensitive datasets in such a way as to maximize utility while simultaneously safegu...
Synthetic data generation is a powerful tool for privacy protection when considering public release ...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
Despite several works that succeed in generating synthetic data with differential privacy (DP) guara...
Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for...
We present three new algorithms for constructing differentially private synthetic data—a sanitized v...
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll informa...
AI-based data synthesis has seen rapid progress over the last several years and is increasingly reco...
Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that e...
Differentially private data generation techniques have become a promising solution to the data priva...
Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal ...
With the recent advances and increasing activities in data mining and analysis, the protection of th...
We consider the problem of enhancing user privacy in common data analysis and machine learning devel...
Abstract—In order to comply with data confidentiality requirements, while meeting usability needs fo...
Advances in computation have created high demand for large datasets, which in turn has sparked inter...
How can we share sensitive datasets in such a way as to maximize utility while simultaneously safegu...
Synthetic data generation is a powerful tool for privacy protection when considering public release ...