In the previous blog post, we described how we retrieved Chinese biographies by the use of a machine learning-classifier. Since then, we updated the collection with English biographies that are linked to the Chinese pages, resulting in a total number of biographies is 338,857 (228,144 in Chinese and 110,713 in English), and derived two forms of metadata from the texts: named entities and inter-language links. Named entities are text elements that constitute everything that can referred to w..
The compilation is a project of the Division of orientalia of the Library of Congress, sponsored by ...
Difangzhi (地方志) is a large collection of local gazetteers complied by local govern-ments of China, a...
Base de données biographiques, réalisée conjointement par l'université Harvard, l'Academia Sinica et...
With the rise of digital humanities, historians explore how to intellectually engage with textual so...
In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....
Folder containing biographies from Wikipedia. The folder is composed of two subfolders, one containi...
International audienceGenerating factual, long-form text such as Wikipedia articles raises three key...
AbstractWe automatically create enormous, free and multilingual silver-standard training annotations...
A Genism LDA Topic Model of English Wikipedia biographical articles with list of all 1.8M articles, ...
International audienceWe add to the literature on notable individuals (famous, prominent, distinguis...
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on t...
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on t...
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcende...
It is arguable whether history is made by great men and women or vice versa, but undoubtably social ...
The linked repository contains the code along with the required corpora that were used in order to b...
The compilation is a project of the Division of orientalia of the Library of Congress, sponsored by ...
Difangzhi (地方志) is a large collection of local gazetteers complied by local govern-ments of China, a...
Base de données biographiques, réalisée conjointement par l'université Harvard, l'Academia Sinica et...
With the rise of digital humanities, historians explore how to intellectually engage with textual so...
In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....
Folder containing biographies from Wikipedia. The folder is composed of two subfolders, one containi...
International audienceGenerating factual, long-form text such as Wikipedia articles raises three key...
AbstractWe automatically create enormous, free and multilingual silver-standard training annotations...
A Genism LDA Topic Model of English Wikipedia biographical articles with list of all 1.8M articles, ...
International audienceWe add to the literature on notable individuals (famous, prominent, distinguis...
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on t...
This paper argues for making a paradigm shift in publishing and using biographical dictionaries on t...
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcende...
It is arguable whether history is made by great men and women or vice versa, but undoubtably social ...
The linked repository contains the code along with the required corpora that were used in order to b...
The compilation is a project of the Division of orientalia of the Library of Congress, sponsored by ...
Difangzhi (地方志) is a large collection of local gazetteers complied by local govern-ments of China, a...
Base de données biographiques, réalisée conjointement par l'université Harvard, l'Academia Sinica et...