We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of no...
In recent years Deep Neural Networks (DNNs) have achieved state-of-the-art results in many fields su...
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trai...
The understanding of generalization in machine learning is in a state of flux. This is partly due to...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
While there has been progress in developing non-vacuous generalization bounds for deep neural networ...
Existing generalization bounds fail to explain crucial factors that drive generalization of modern n...
In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neura...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
While significant theoretical progress has been achieved, unveiling the generalization mystery of ov...
Increasing the size of overparameterized neural networks has been shown to improve their generalizat...
Understanding generalization is crucial to confidently engineer and deploy machine learning models, ...
This paper provides theoretical insights into why and how deep learning can generalize well, despite...
In recent years Deep Neural Networks (DNNs) have achieved state-of-the-art results in many fields su...
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trai...
The understanding of generalization in machine learning is in a state of flux. This is partly due to...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
While there has been progress in developing non-vacuous generalization bounds for deep neural networ...
Existing generalization bounds fail to explain crucial factors that drive generalization of modern n...
In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neura...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
While significant theoretical progress has been achieved, unveiling the generalization mystery of ov...
Increasing the size of overparameterized neural networks has been shown to improve their generalizat...
Understanding generalization is crucial to confidently engineer and deploy machine learning models, ...
This paper provides theoretical insights into why and how deep learning can generalize well, despite...
In recent years Deep Neural Networks (DNNs) have achieved state-of-the-art results in many fields su...
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trai...
The understanding of generalization in machine learning is in a state of flux. This is partly due to...