Natural language is a remarkable example of a complex dynamical system which combines variation and universal structure emerging from the interaction of millions of individuals. Understanding statistical properties of texts is not only crucial in applications of information retrieval and natural language processing, e.g. search engines, but also allow deeper insights into the organization of knowledge in the form of written text. In this thesis, we investigate the statistical and dynamical processes underlying the coexistence of universality and variability in word statistics. We combine a careful statistical analysis of large empirical databases on language usage with analytical and numerical studies of stochastic models
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
We propose a stochastic model for the number of different words in a given database which incorporat...
Human language evolved by natural mechanisms into an efficient system capable of coding and transmit...
Natural language is a remarkable example of a complex dynamical system which combines variation and ...
The recent dramatic increase in online data availability has allowed researchers to explore human cu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
<p>The recent dramatic increase in online data availability has allowed researchers to explore human...
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
BACKGROUND: Zipf's discovery that word frequency distributions obey a power law established parallel...
Zipf's law is just one out of many universal laws proposed to describe statistical regularities in l...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
This paper takes a system dynamic approach to study homogeneous texts where the dynamics of the lexi...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
We propose a stochastic model for the number of different words in a given database which incorporat...
Human language evolved by natural mechanisms into an efficient system capable of coding and transmit...
Natural language is a remarkable example of a complex dynamical system which combines variation and ...
The recent dramatic increase in online data availability has allowed researchers to explore human cu...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
<p>The recent dramatic increase in online data availability has allowed researchers to explore human...
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
BACKGROUND: Zipf's discovery that word frequency distributions obey a power law established parallel...
Zipf's law is just one out of many universal laws proposed to describe statistical regularities in l...
We analyze the occurrence frequencies of over 15 million words recorded in millions of books publish...
Recently, it was demonstrated that generalized entropies of order α offer novel and important opport...
This paper takes a system dynamic approach to study homogeneous texts where the dynamics of the lexi...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
We propose a stochastic model for the number of different words in a given database which incorporat...
Human language evolved by natural mechanisms into an efficient system capable of coding and transmit...