Wikipedia is a valuable resource whose usage goes beyond the encyclopedia itself. In this paper the proposal is to use Wikipedia as a large source of text, suitable for language research, explaining the followed procedure to turn Spanish Wikipedia raw data into a suitable text source, considering the format of source data (wiki syntax), the conversion from written text to individual sentences or the conversion from acronyms or numbers to the way they are said. The case explained here is specific in some parts to the Spanish wikipedia, but the ideas and some steps of the followed procedure can be generalised to any language or text source. 1
Wikipedia is not only a large encyclopedia, but lately also a source of linguistic data for various ...
Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the In...
The aim of the study was to analyze the online encyclopaedia Wikipedia as a tool for the disseminati...
The purpose of this study is the analysis of the potential of Wikipedia as a key tool for the dissem...
Spanish text-corpus extracted from Wikipedia, using the platform described on Cadavid Rengifo, Hécto...
<p></p><p>Abstract The aim of the study was to analyze the online encyclopaedia Wikipedia as a tool ...
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is there...
Wikipedia, the popular online encyclopedia, has in just six years grown from an adjunct to the now-d...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
Wikidata your public linked database on anything. Wikipedia exists in almost 300 language versions. ...
This paper describes SW1, the first version of a semantically annotated snapshot of the EnglishWikip...
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing comm...
Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the In...
Wikipedia has become one of the most popular resources in natural language processing and it is used...
Wikipedia is not only a large encyclopedia, but lately also a source of linguistic data for various ...
Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the In...
The aim of the study was to analyze the online encyclopaedia Wikipedia as a tool for the disseminati...
The purpose of this study is the analysis of the potential of Wikipedia as a key tool for the dissem...
Spanish text-corpus extracted from Wikipedia, using the platform described on Cadavid Rengifo, Hécto...
<p></p><p>Abstract The aim of the study was to analyze the online encyclopaedia Wikipedia as a tool ...
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is there...
Wikipedia, the popular online encyclopedia, has in just six years grown from an adjunct to the now-d...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
Wikidata your public linked database on anything. Wikipedia exists in almost 300 language versions. ...
This paper describes SW1, the first version of a semantically annotated snapshot of the EnglishWikip...
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing comm...
Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the In...
Wikipedia has become one of the most popular resources in natural language processing and it is used...
Wikipedia is not only a large encyclopedia, but lately also a source of linguistic data for various ...
Wikipedia has long presented itself as “the biggest multilingual free-content encyclopedia on the In...
The aim of the study was to analyze the online encyclopaedia Wikipedia as a tool for the disseminati...