In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.Fil: Hernández ...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...
Written language is a complex communication signal capable of conveying information encoded in the f...
We review some recent progress on the characterisation of long-range patterns of word use in languag...
The use of methods borrowed from statistics and physics to analyze written texts has allowed the dis...
Semantic similarity measurement aims to determine the likeness between two text expressions that use...
The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the t...
Here we show that the recently reported presence of long-range correlations in the distribution of w...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
The structure of written texts is analyzed by focusing on word sequences. As a method, word sequence...
This thesis concerns the notion of 'information structure': informally, organization of in...
Many features of texts and languages can now be inferred from statistical analyses using concepts fr...
Abstract: An analysis of linguistic approaches to determining the lexical cohesion in text reveals d...
We study the correlation properties of word lengths in large texts from 30 ebooks in the English lan...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...
Written language is a complex communication signal capable of conveying information encoded in the f...
We review some recent progress on the characterisation of long-range patterns of word use in languag...
The use of methods borrowed from statistics and physics to analyze written texts has allowed the dis...
Semantic similarity measurement aims to determine the likeness between two text expressions that use...
The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the t...
Here we show that the recently reported presence of long-range correlations in the distribution of w...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
The structure of written texts is analyzed by focusing on word sequences. As a method, word sequence...
This thesis concerns the notion of 'information structure': informally, organization of in...
Many features of texts and languages can now be inferred from statistical analyses using concepts fr...
Abstract: An analysis of linguistic approaches to determining the lexical cohesion in text reveals d...
We study the correlation properties of word lengths in large texts from 30 ebooks in the English lan...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...
While the use of statistical physics methods to analyze large corpora has been useful to unveil many...