We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression $y = X \theta + w$ with random design $X \in \mathbb{R}^{n \times d}$ under the proportional asymptotics $d/n \to \gamma \in (1, \infty)$. We precisely characterize how prediction (test) error necessarily scales with training error in this setting. An implication of this characterization is that as the label noise variance $\sigma^2 \to 0$, any estimator that incurs at least $\mathsf{c}\sigma^4$ training error for some constant $\mathsf{c}$ is necessarily suboptimal and will s...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Statistical wisdom suggests that very complex models, interpolating training data, will be poor at p...
Many modern machine learning models are trained to achieve zero or near-zero training error in order...
© 2019 by the author(s). We show that classical learning methods interpolating the training data can...
© 2019 by the author(s). We show that classical learning methods interpolating the training data can...
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in ...
The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodol...
Overparametrized interpolating models have drawn increasing attention from machine learning. Some re...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
Understanding when and why interpolating methods generalize well has recently been a topic of intere...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Statistical wisdom suggests that very complex models, interpolating training data, will be poor at p...
Many modern machine learning models are trained to achieve zero or near-zero training error in order...
© 2019 by the author(s). We show that classical learning methods interpolating the training data can...
© 2019 by the author(s). We show that classical learning methods interpolating the training data can...
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in ...
The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodol...
Overparametrized interpolating models have drawn increasing attention from machine learning. Some re...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
Understanding when and why interpolating methods generalize well has recently been a topic of intere...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
International audienceWe analyse the interpolator with minimal 2-norm β in a general high dimensiona...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Statistical wisdom suggests that very complex models, interpolating training data, will be poor at p...