Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and attempts at straightforwardly applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead. We show that this performance drop can be mitigated with (1) the use of large pretrained models; (2) hyperparameters that suit DP optimization; and (3) fine-tuning objectives aligned with the pretraining procedure. With these factors set right, we obtain private NLP models that outperform state-of-the-art private training approaches and strong non-private baselines -- by directly fine-tuning pretrained models with DP optimization on moderately-s...
Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
While modern machine learning models rely on increasingly large training datasets, data is often lim...
We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scal...
As the use of large embedding models in recommendation systems and language applications increases, ...
Preserving privacy in contemporary NLP models allows us to work with sensitive data, but unfortunate...
We study the problem of differentially private (DP) fine-tuning of large pre-trained models -- a rec...
Protecting large language models from privacy leakage is becoming increasingly crucial with their wi...
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in...
Federated Learning (FL) is a technique to train models using data distributed across devices. Differ...
A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descen...
Pre-training large transformer models with in-domain data improves domain adaptation and helps gain ...
Training large neural networks with meaningful/usable differential privacy security guarantees is a ...
Per-example gradient clipping is a key algorithmic step that enables practical differential private ...
Existing approaches for training neural networks with user-level differential privacy (e.g., DP Fede...
Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
While modern machine learning models rely on increasingly large training datasets, data is often lim...
We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scal...
As the use of large embedding models in recommendation systems and language applications increases, ...
Preserving privacy in contemporary NLP models allows us to work with sensitive data, but unfortunate...
We study the problem of differentially private (DP) fine-tuning of large pre-trained models -- a rec...
Protecting large language models from privacy leakage is becoming increasingly crucial with their wi...
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in...
Federated Learning (FL) is a technique to train models using data distributed across devices. Differ...
A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descen...
Pre-training large transformer models with in-domain data improves domain adaptation and helps gain ...
Training large neural networks with meaningful/usable differential privacy security guarantees is a ...
Per-example gradient clipping is a key algorithmic step that enables practical differential private ...
Existing approaches for training neural networks with user-level differential privacy (e.g., DP Fede...
Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
While modern machine learning models rely on increasingly large training datasets, data is often lim...