We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2 % of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows ...
STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural ...
In this paper, we analyze a novel set of features for the task of automatic edit category classifica...
International audienceTo increase its credibility and preserve the trust of its readers, Wikipedia n...
Introduction Wikipedia is written in the wikitext markup language. When serving content, the MediaW...
This dataset includes the historical versions of all individual references per article in the Englis...
Abstract. Much of work in semantic web relying on Wikipedia as the main source of knowledge often wo...
Wikipedia revision metadata for every edit to every page in seven major language versions of Wikiped...
Wikis are popular tools commonly used to support distributedcollaborative work. Wikis can be seen as...
International audienceDBpedia is a huge dataset essentially extracted from the content and structure...
International audienceDBpedia is a huge dataset essentially extracted from the content and structure...
International audienceWe describe a DBpedia extractor materializing the editing history of Wikipedia...
Naturally-occurring instances of linguistic phenomena are important both for training and for evalua...
We present a dataset that contains every instance of all tokens (words) ever written in undeleted, n...
Wikipedia, the popular online encyclopedia, has in just six years grown from an adjunct to the now-d...
We present a novel paradigm for obtaining large amounts of training data for computational linguisti...
STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural ...
In this paper, we analyze a novel set of features for the task of automatic edit category classifica...
International audienceTo increase its credibility and preserve the trust of its readers, Wikipedia n...
Introduction Wikipedia is written in the wikitext markup language. When serving content, the MediaW...
This dataset includes the historical versions of all individual references per article in the Englis...
Abstract. Much of work in semantic web relying on Wikipedia as the main source of knowledge often wo...
Wikipedia revision metadata for every edit to every page in seven major language versions of Wikiped...
Wikis are popular tools commonly used to support distributedcollaborative work. Wikis can be seen as...
International audienceDBpedia is a huge dataset essentially extracted from the content and structure...
International audienceDBpedia is a huge dataset essentially extracted from the content and structure...
International audienceWe describe a DBpedia extractor materializing the editing history of Wikipedia...
Naturally-occurring instances of linguistic phenomena are important both for training and for evalua...
We present a dataset that contains every instance of all tokens (words) ever written in undeleted, n...
Wikipedia, the popular online encyclopedia, has in just six years grown from an adjunct to the now-d...
We present a novel paradigm for obtaining large amounts of training data for computational linguisti...
STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural ...
In this paper, we analyze a novel set of features for the task of automatic edit category classifica...
International audienceTo increase its credibility and preserve the trust of its readers, Wikipedia n...