We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are oracle-efficient in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Th...
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll informa...
Abstract—Evaluating the performance of database systems is crucial when database vendors or research...
The availability of large amounts of informative data is crucial for successful machine learning. Ho...
We present new theoretical results on differentially private data release useful with respect to any...
Differential privacy is the now de facto industry standard for ensuring privacy while publicly relea...
Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end...
Computing technologies today have made it much easier to gather personal data, ranging from GPS loca...
As both the scope and scale of data collection increases, an increasingly large amount of sensitive ...
In this thesis, we study when algorithmic tasks can be performed on sensitive data while protecting ...
Releasing sensitive data while preserving privacy is an important problem that has attracted conside...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
We demonstrate that, ignoring computational constraints, it is possible to release privacy-preservin...
We show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that a...
Many large databases of personal information currently exist in the hands of corporations, nonprofit...
A common goal of privacy research is to release synthetic data that satisfies a formal privacy guara...
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll informa...
Abstract—Evaluating the performance of database systems is crucial when database vendors or research...
The availability of large amounts of informative data is crucial for successful machine learning. Ho...
We present new theoretical results on differentially private data release useful with respect to any...
Differential privacy is the now de facto industry standard for ensuring privacy while publicly relea...
Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end...
Computing technologies today have made it much easier to gather personal data, ranging from GPS loca...
As both the scope and scale of data collection increases, an increasingly large amount of sensitive ...
In this thesis, we study when algorithmic tasks can be performed on sensitive data while protecting ...
Releasing sensitive data while preserving privacy is an important problem that has attracted conside...
In a world where artificial intelligence and data science become omnipresent, data sharing is increa...
We demonstrate that, ignoring computational constraints, it is possible to release privacy-preservin...
We show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that a...
Many large databases of personal information currently exist in the hands of corporations, nonprofit...
A common goal of privacy research is to release synthetic data that satisfies a formal privacy guara...
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll informa...
Abstract—Evaluating the performance of database systems is crucial when database vendors or research...
The availability of large amounts of informative data is crucial for successful machine learning. Ho...