<p>Set of 408 biographic articles extracted from Wikipedia. Most of them are represented by 5 different files : text only, text and hyperlinks, annotations, meta-data, and html.</p
htmlabstractIn recent years, several datasets have been released that include images and text, givin...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....
<p>Set of 250 biographic articles extracted from Wikipedia. Most of them are represented by 3 differ...
<p>This corpus contains 408 Wikipedia articles. Those are biographies, manually annotated to higligh...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
This platform was initially designed to apply and compare Named Entity Recognition (NER) tools on co...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
This text corpus is composed of texts of English Wikipedia extracted from the Wikipedia dump of 26th...
This archive contains a collection of language corpora. These are text files that contain samples of...
Wikipedia Human Medicine Corpus is a bilingual—Spanish-English—single-label corpus composed of 2,143...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
Abstract Background Lately, there has been a great interest in the application of information extrac...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
The amount of digital data derived from healthcare processes have increased tremendously in the last...
htmlabstractIn recent years, several datasets have been released that include images and text, givin...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....
<p>Set of 250 biographic articles extracted from Wikipedia. Most of them are represented by 3 differ...
<p>This corpus contains 408 Wikipedia articles. Those are biographies, manually annotated to higligh...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
This platform was initially designed to apply and compare Named Entity Recognition (NER) tools on co...
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...
This text corpus is composed of texts of English Wikipedia extracted from the Wikipedia dump of 26th...
This archive contains a collection of language corpora. These are text files that contain samples of...
Wikipedia Human Medicine Corpus is a bilingual—Spanish-English—single-label corpus composed of 2,143...
A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...
Abstract Background Lately, there has been a great interest in the application of information extrac...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
The amount of digital data derived from healthcare processes have increased tremendously in the last...
htmlabstractIn recent years, several datasets have been released that include images and text, givin...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....