This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the list of featured articles (<https://en.wikipedia.org/wiki/Wikipedia:Featured_articles>) of the 'Media', 'Literature and Theater', 'Music biographies', 'Media biographies', 'History biographies' and 'Video gaming' categories. From the list of articles, the structure of the document, i.e. sections and subsections of the text, is extracted
For each existing Wikipedia language edition, the dataset contains a classification of the articles ...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
Movies-related articles extracted from Wikipedia. For all articles, the figures and tables have bee...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
This project contains data sets consisting of all page headings (article section titles) for the Eng...
Video games-related articles extracted from Wikipedia. For all articles, the figures and tables hav...
A Genism LDA Topic Model of English Wikipedia biographical articles with list of all 1.8M articles, ...
<p>Here <i>N</i><sub><i>a</i></sub> is the number of articles. Wikipedia data were collected in midd...
Three corpora in different domains extracted from Wikipedia. For all datasets, the figures and tabl...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
<strong>ABSTRACT</strong></p><p>The article represents an overview of relevant Wikipedia studies, es...
<p>Set of 408 biographic articles extracted from Wikipedia. Most of them are represented by 5 differ...
A video games NLP dataset extracted from Wikipedia. For all articles, the figures and tables have b...
For each existing Wikipedia language edition, the dataset contains a classification of the articles ...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
Movies-related articles extracted from Wikipedia. For all articles, the figures and tables have bee...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
This project contains data sets consisting of all page headings (article section titles) for the Eng...
Video games-related articles extracted from Wikipedia. For all articles, the figures and tables hav...
A Genism LDA Topic Model of English Wikipedia biographical articles with list of all 1.8M articles, ...
<p>Here <i>N</i><sub><i>a</i></sub> is the number of articles. Wikipedia data were collected in midd...
Three corpora in different domains extracted from Wikipedia. For all datasets, the figures and tabl...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
<strong>ABSTRACT</strong></p><p>The article represents an overview of relevant Wikipedia studies, es...
<p>Set of 408 biographic articles extracted from Wikipedia. Most of them are represented by 5 differ...
A video games NLP dataset extracted from Wikipedia. For all articles, the figures and tables have b...
For each existing Wikipedia language edition, the dataset contains a classification of the articles ...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https...