We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or ``tags,'' namely, nouns, verbs, and others), and analyze the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps' law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specif...
[eng] The large amount of digitized linguistic data opens up the unique possibility of using the me...
Natural language is a remarkable example of a complex dynamical system which combines variation and ...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
We study the relationship between vocabulary size and text length in a corpus of 75 literary works i...
This paper is devoted to verifying of the empirical Zipf and Hips laws in natural languages using Go...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
The dependence on text length of the statistical properties of word occurrences has long been consid...
The dependence with text length of the statistical properties of word occurrences has long been cons...
Heaps' law is an empirical relation in text analysis that predicts vocabulary growth as a function o...
We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different gram...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
Evidence is presented for a systematic text-length dependence of the power-law index gamma of a sing...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
[eng] The large amount of digitized linguistic data opens up the unique possibility of using the me...
Natural language is a remarkable example of a complex dynamical system which combines variation and ...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
We study the relationship between vocabulary size and text length in a corpus of 75 literary works i...
This paper is devoted to verifying of the empirical Zipf and Hips laws in natural languages using Go...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
The dependence on text length of the statistical properties of word occurrences has long been consid...
The dependence with text length of the statistical properties of word occurrences has long been cons...
Heaps' law is an empirical relation in text analysis that predicts vocabulary growth as a function o...
We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different gram...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
Evidence is presented for a systematic text-length dependence of the power-law index gamma of a sing...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
[eng] The large amount of digitized linguistic data opens up the unique possibility of using the me...
Natural language is a remarkable example of a complex dynamical system which combines variation and ...
The formation of sentences is a highly structured and history-dependent process. The probability of ...