We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypothes...
Large language models (LLMs)—machine learning algorithms that can recognize, summarize, translate,...
The business marketplace has been flooded with waves of technology trends that periodically surface...
Scaling language models with more data, compute and parameters has driven significant progress in na...
We analyze the growth of dataset sizes used in machine learning for natural language processing and ...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
It took until the last decade to finally see a machine match human performance on essentially any ta...
Machine learning (ML), a computational self-learning platform, is expected to be applied in a variet...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ...
Item does not contain fulltextComputational model simulations have been very fruitful for gaining in...
We study the compute-optimal trade-off between model and training data set sizes for large neural ne...
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navi...
Neural scaling laws define a predictable relationship between a model's parameter count and its perf...
Despite their wide adoption, the underlying training and memorization dynamics of very large languag...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
Large language models (LLMs)—machine learning algorithms that can recognize, summarize, translate,...
The business marketplace has been flooded with waves of technology trends that periodically surface...
Scaling language models with more data, compute and parameters has driven significant progress in na...
We analyze the growth of dataset sizes used in machine learning for natural language processing and ...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
It took until the last decade to finally see a machine match human performance on essentially any ta...
Machine learning (ML), a computational self-learning platform, is expected to be applied in a variet...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ...
Item does not contain fulltextComputational model simulations have been very fruitful for gaining in...
We study the compute-optimal trade-off between model and training data set sizes for large neural ne...
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navi...
Neural scaling laws define a predictable relationship between a model's parameter count and its perf...
Despite their wide adoption, the underlying training and memorization dynamics of very large languag...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
Large language models (LLMs)—machine learning algorithms that can recognize, summarize, translate,...
The business marketplace has been flooded with waves of technology trends that periodically surface...
Scaling language models with more data, compute and parameters has driven significant progress in na...