Data to support calculations in chapter 1 of the book _Distant Horizons._ It includes word counts for volumes of fiction and biography, 1700-2000. For metadata and code, see the github repository (https://github.com/tedunderwood/horizon/tree/master/chapter1).Ope
<p>A compressed folder containing 31075 numerical vectors. Each one represents word frequencies of a...
A popular form of term weighting in texts is to use TF*IDF, which takes a text's term frequencies an...
A topic model of 29,341 volumes of fiction, written in English and published between 1880 and 1999. ...
Data to support calculations in chapter 1 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 2 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 3 of the book _Distant Horizons._ It includes word counts fo...
Data and code supporting the book Distant Horizons, by Ted Underwood, to be published by University ...
A zipped folder of files keyed to HathiTrust volume IDs, each representing a volume of English-langu...
Data to support chapter 4 of the book _Distant Horizons._ This includes mostly lists of words associ...
Tab-separated files containing wordcounts from volumes of fiction. The names of the files are keyed ...
The chapters dataset includes information about all published book chapters from Springer Nature. Th...
Corpus-level term statistics are valuable for numerous text analysis activities, such as term weight...
Number of words per chapter in the dataset for each translator of Don Quixote.</p
Data discussed in the manuscript "How to Read a Million Books: An Introduction to Data Analysis for ...
The books dataset includes information about all published books from Springer Nature. This dataset ...
<p>A compressed folder containing 31075 numerical vectors. Each one represents word frequencies of a...
A popular form of term weighting in texts is to use TF*IDF, which takes a text's term frequencies an...
A topic model of 29,341 volumes of fiction, written in English and published between 1880 and 1999. ...
Data to support calculations in chapter 1 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 2 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 3 of the book _Distant Horizons._ It includes word counts fo...
Data and code supporting the book Distant Horizons, by Ted Underwood, to be published by University ...
A zipped folder of files keyed to HathiTrust volume IDs, each representing a volume of English-langu...
Data to support chapter 4 of the book _Distant Horizons._ This includes mostly lists of words associ...
Tab-separated files containing wordcounts from volumes of fiction. The names of the files are keyed ...
The chapters dataset includes information about all published book chapters from Springer Nature. Th...
Corpus-level term statistics are valuable for numerous text analysis activities, such as term weight...
Number of words per chapter in the dataset for each translator of Don Quixote.</p
Data discussed in the manuscript "How to Read a Million Books: An Introduction to Data Analysis for ...
The books dataset includes information about all published books from Springer Nature. This dataset ...
<p>A compressed folder containing 31075 numerical vectors. Each one represents word frequencies of a...
A popular form of term weighting in texts is to use TF*IDF, which takes a text's term frequencies an...
A topic model of 29,341 volumes of fiction, written in English and published between 1880 and 1999. ...