This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the list of featured articles () of the 'Media', 'Literature and Theater', 'Music biographies', 'Media biographies', 'History biographies' and 'Video gaming' categories. From the list of articles, the structure of the document, i.e. sections and subsections of the text, is extracted. The dataset also contains a proposed clusterization of the event names to increase comparability of Wikipedia articles
A video games NLP dataset extracted from Wikipedia. For all articles, the figures and tables have b...
Most traditional text clustering methods are based on “bag of words ” (BOW) representation based on ...
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing su...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
Movies-related articles extracted from Wikipedia. For all articles, the figures and tables have bee...
Reflecting the rapid growth of science, technology, and culture, it has become common practice to co...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing su...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
International audienceWikipedia is a rich source of information across many knowledge domains. Yet, ...
Three corpora in different domains extracted from Wikipedia. For all datasets, the figures and tabl...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
For each existing Wikipedia language edition, the dataset contains a classification of the articles ...
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many...
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document ...
A video games NLP dataset extracted from Wikipedia. For all articles, the figures and tables have b...
Most traditional text clustering methods are based on “bag of words ” (BOW) representation based on ...
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing su...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
Movies-related articles extracted from Wikipedia. For all articles, the figures and tables have bee...
Reflecting the rapid growth of science, technology, and culture, it has become common practice to co...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing su...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
International audienceWikipedia is a rich source of information across many knowledge domains. Yet, ...
Three corpora in different domains extracted from Wikipedia. For all datasets, the figures and tabl...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
For each existing Wikipedia language edition, the dataset contains a classification of the articles ...
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many...
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document ...
A video games NLP dataset extracted from Wikipedia. For all articles, the figures and tables have b...
Most traditional text clustering methods are based on “bag of words ” (BOW) representation based on ...
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing su...