We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen-Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in...
<p>For each language, blue bars represent the average entropy of the random ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Shannon estimates the entropy of the set of words in printed English as 11.82 bits per word. As this...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
Quantifying the similarity between symbolic sequences is a traditional problem in information theory...
We estimate the n-gram entropies of natural language texts in word-length representation and find th...
The word-frequency distribution provides the fundamental building blocks that generate discourse in ...
The word-frequency distribution provides the fundamental building blocks that generate discourse in ...
International audienceZipf’s law has intrigued people for a long time. This distribution models a ce...
The choice associated with words is a fundamental property of natural languages. It lies at the hear...
The choice associated with words is a fundamental property of natural languages. It lies at the hear...
Despite being a paradigm of quantitative linguistics, Zipf'\''s law for words suffers from three mai...
Despite being a paradigm of quantitative linguistics, Zipf\'\''s law for words suffers from three ma...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
<p>For each language, blue bars represent the average entropy of the random ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Shannon estimates the entropy of the set of words in printed English as 11.82 bits per word. As this...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
Quantifying the similarity between symbolic sequences is a traditional problem in information theory...
We estimate the n-gram entropies of natural language texts in word-length representation and find th...
The word-frequency distribution provides the fundamental building blocks that generate discourse in ...
The word-frequency distribution provides the fundamental building blocks that generate discourse in ...
International audienceZipf’s law has intrigued people for a long time. This distribution models a ce...
The choice associated with words is a fundamental property of natural languages. It lies at the hear...
The choice associated with words is a fundamental property of natural languages. It lies at the hear...
Despite being a paradigm of quantitative linguistics, Zipf'\''s law for words suffers from three mai...
Despite being a paradigm of quantitative linguistics, Zipf\'\''s law for words suffers from three ma...
The frequency of words and letters in bodies of text has been heavily studied for several purposes, ...
<p>For each language, blue bars represent the average entropy of the random ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Shannon estimates the entropy of the set of words in printed English as 11.82 bits per word. As this...