It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.Peer Reviewe
Brevity and frequency are two crucial factors in the processes of statistical learning. The compress...
Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive ...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
The results of a tabulation of word frequencies in a sample of written English are analyzed in terms...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
BACKGROUND: Zipf's discovery that word frequency distributions obey a power law established parallel...
Here, assuming a general communication model where objects map to signals, a power function for the ...
The dependence on text length of the statistical properties of word occurrences has long been consid...
Brevity and frequency are two crucial factors in the processes of statistical learning. The compress...
Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive ...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
The results of a tabulation of word frequencies in a sample of written English are analyzed in terms...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
The formation of sentences is a highly structured and history-dependent process. The probability of ...
BACKGROUND: Zipf's discovery that word frequency distributions obey a power law established parallel...
Here, assuming a general communication model where objects map to signals, a power function for the ...
The dependence on text length of the statistical properties of word occurrences has long been consid...
Brevity and frequency are two crucial factors in the processes of statistical learning. The compress...
Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive ...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...