WDumper is a third-party tool enables users to craete custom dump of Wikidata. Here we create some topical subset of Wikidata by WDumper. There are 4 usecases: politicians: People with occupation of politicians in Wikidata militPoliticians: People with occupation of politicians which are military and also have a military rank of General in Wikidata ukUniversities: All United Kingdom universities in Wikidata geneWiki: A subset from this class-diagram The subsets are in .nt.gz format. For each usecase there are two types of subsets in terms of content, one with References and Qualifiers that has a "withRQFS" in the name, and one without this feature. Also for each use case, there are two extracted subsets one from 27 April 2015 and an...
Source file: GeneTaxon_wikidata-20220630-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
We maintain a wiki comparison dataset (which we used to call a wiki segmentation dataset) to show a ...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
Files in this dataset have been produced during Flexibility experiments of Wikidata subsetting pract...
Files in this dataset have been produced during Performance and Accuracy experiments of Wikidata sub...
This dataset consists the complete revision history of every instance of the 100 most important clas...
Wikidata is the newest project of the Wikimedia Foundation (WMF), the non-profit U.S.-based foundati...
We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling W...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
Source file: GeneTaxon_wikidata-20190121-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20160613-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20201102-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20210531-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20180115-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20170821-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20220630-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
We maintain a wiki comparison dataset (which we used to call a wiki segmentation dataset) to show a ...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
Files in this dataset have been produced during Flexibility experiments of Wikidata subsetting pract...
Files in this dataset have been produced during Performance and Accuracy experiments of Wikidata sub...
This dataset consists the complete revision history of every instance of the 100 most important clas...
Wikidata is the newest project of the Wikimedia Foundation (WMF), the non-profit U.S.-based foundati...
We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling W...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
Source file: GeneTaxon_wikidata-20190121-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20160613-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20201102-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20210531-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20180115-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20170821-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
Source file: GeneTaxon_wikidata-20220630-all.ttl.gz ShEx: https://github.com/kg-subsetting/paper-wi...
We maintain a wiki comparison dataset (which we used to call a wiki segmentation dataset) to show a ...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...