Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as $N^{-\alpha}$, $\alpha=4/d$, where $N$ is the number of model parameters, and $d$ is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem $y=x^2$ manifests a different scaling law ($\alpha=1$) from their predictions ($\alpha=4$). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more "lottery tickets", which are ensembled to reduce the variance of outputs. We support the ense...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
Contains fulltext : 157557.pdf (publisher's version ) (Open Access)The organizatio...
Scaling laws have been observed in many natural and engineered systems. Their existence can give use...
Neural scaling laws define a predictable relationship between a model's parameter count and its perf...
<p><b>A</b> Downscaling facilitates simulations, while taking the <i>N</i> → ∞ limit often affords a...
It took until the last decade to finally see a machine match human performance on essentially any ta...
International audienceThe lottery ticket hypothesis states that a randomly-initialized neural networ...
Scaling laws are ubiquitous in nature, and they pervade neural, behavioral and linguistic activities...
<p><b>A)</b> Comparison between the inferred nonlinearity in the range of energies observed in the d...
Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tick...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
Learning about a causal or statistical association depends on comparing frequencies of joint occurre...
The presence of universal phenomena both hints towards deep underlying principles and can also serve...
Contains fulltext : 133600.pdf (publisher's version ) (Open Access)Time series of ...
Scaling laws are ubiquitous in nature, and they pervade neural, behavioral and linguistic activities...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
Contains fulltext : 157557.pdf (publisher's version ) (Open Access)The organizatio...
Scaling laws have been observed in many natural and engineered systems. Their existence can give use...
Neural scaling laws define a predictable relationship between a model's parameter count and its perf...
<p><b>A</b> Downscaling facilitates simulations, while taking the <i>N</i> → ∞ limit often affords a...
It took until the last decade to finally see a machine match human performance on essentially any ta...
International audienceThe lottery ticket hypothesis states that a randomly-initialized neural networ...
Scaling laws are ubiquitous in nature, and they pervade neural, behavioral and linguistic activities...
<p><b>A)</b> Comparison between the inferred nonlinearity in the range of energies observed in the d...
Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tick...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
Learning about a causal or statistical association depends on comparing frequencies of joint occurre...
The presence of universal phenomena both hints towards deep underlying principles and can also serve...
Contains fulltext : 133600.pdf (publisher's version ) (Open Access)Time series of ...
Scaling laws are ubiquitous in nature, and they pervade neural, behavioral and linguistic activities...
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning...
Contains fulltext : 157557.pdf (publisher's version ) (Open Access)The organizatio...
Scaling laws have been observed in many natural and engineered systems. Their existence can give use...