The dependence with text length of the statistical properties of word occurrences has long been considered a severe limitation {for the usefulness of} quantitative linguistics. We propose a simple scaling form for the distribution of absolute word frequencies which uncovers the robustness of this distribution as text grows. In this way, the shape of the distribution is always the same and it is only a scale parameter which increases linearly with text length. By analyzing very long novels we show that this behavior holds both for raw, unlemmatized texts and for lemmatized texts. For the latter case, the word-frequency distribution is well fit by a double power law, maintaining the Zipf'\''s exponent value $ \gamma\simeq 2$ for large frequen...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic ...
An important body of quantitative linguistics is constituted by a series of statistical laws about l...
The dependence on text length of the statistical properties of word occurrences has long been consid...
Some authors have recently argued that a finite-size scaling law for the text-length dependence of w...
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Despite being a paradigm of quantitative linguistics, Zipf'\''s law for words suffers from three mai...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Despite being a paradigm of quantitative linguistics, Zipf\'\''s law for words suffers from three ma...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
Quantitative linguistics has provided us with a number of empirical laws that characterise the evolu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic ...
An important body of quantitative linguistics is constituted by a series of statistical laws about l...
The dependence on text length of the statistical properties of word occurrences has long been consid...
Some authors have recently argued that a finite-size scaling law for the text-length dependence of w...
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Despite being a paradigm of quantitative linguistics, Zipf'\''s law for words suffers from three mai...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Zipf’s law is a fundamental paradigm in the statistics of written and spoken natural language as wel...
Despite being a paradigm of quantitative linguistics, Zipf\'\''s law for words suffers from three ma...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
Quantitative linguistics has provided us with a number of empirical laws that characterise the evolu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic ...
An important body of quantitative linguistics is constituted by a series of statistical laws about l...