Data to support calculations in chapter 2 of the book _Distant Horizons._ It includes word counts for volumes of fiction, especially Gothic, detective, and science fiction from the nineteenth and twentieth centuries. For the argument about genre founded on this data, see the book and the supporting code repository: https://github.com/tedunderwood/horizon/tree/master/chapter2.Ope
Metadata for 774 works of fiction referenced in "Mapping Mutable Genres in Structurally Complex Volu...
This dataset includes derived data on a collection of ca. 2,700 books in English published between 2...
Using regularized logistic regression and hidden Markov models, we predict genre at the page level i...
Data to support calculations in chapter 1 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 3 of the book _Distant Horizons._ It includes word counts fo...
A zipped folder of files keyed to HathiTrust volume IDs, each representing a volume of English-langu...
A topic model of 29,341 volumes of fiction, written in English and published between 1880 and 1999. ...
Corpus-level term statistics are valuable for numerous text analysis activities, such as term weight...
Metadata for English-language fiction in HathiTrust Digital Library, after 1922. These volumes were ...
An initial version of the data paper accompanying these datasets is published on my open notebook, P...
This workset is data in support of the article "Mapping Mutable Genres in Structurally Complex Volum...
Code and data to support analysis reported in referenced article. Raw texts not included, but derive...
Data discussed in the manuscript "How to Read a Million Books: An Introduction to Data Analysis for ...
Corpus as a tool has been introduced in literature and linguistics for many years, allowing linguist...
A popular form of term weighting in texts is to use TF*IDF, which takes a text's term frequencies an...
Metadata for 774 works of fiction referenced in "Mapping Mutable Genres in Structurally Complex Volu...
This dataset includes derived data on a collection of ca. 2,700 books in English published between 2...
Using regularized logistic regression and hidden Markov models, we predict genre at the page level i...
Data to support calculations in chapter 1 of the book _Distant Horizons._ It includes word counts fo...
Data to support calculations in chapter 3 of the book _Distant Horizons._ It includes word counts fo...
A zipped folder of files keyed to HathiTrust volume IDs, each representing a volume of English-langu...
A topic model of 29,341 volumes of fiction, written in English and published between 1880 and 1999. ...
Corpus-level term statistics are valuable for numerous text analysis activities, such as term weight...
Metadata for English-language fiction in HathiTrust Digital Library, after 1922. These volumes were ...
An initial version of the data paper accompanying these datasets is published on my open notebook, P...
This workset is data in support of the article "Mapping Mutable Genres in Structurally Complex Volum...
Code and data to support analysis reported in referenced article. Raw texts not included, but derive...
Data discussed in the manuscript "How to Read a Million Books: An Introduction to Data Analysis for ...
Corpus as a tool has been introduced in literature and linguistics for many years, allowing linguist...
A popular form of term weighting in texts is to use TF*IDF, which takes a text's term frequencies an...
Metadata for 774 works of fiction referenced in "Mapping Mutable Genres in Structurally Complex Volu...
This dataset includes derived data on a collection of ca. 2,700 books in English published between 2...
Using regularized logistic regression and hidden Markov models, we predict genre at the page level i...