Nerwip Corpus v4 - Data

A corpus of images and text in online news

Hollink, Laura
Bedjeti, Adriatik
Harmelen, M.
Elliott, Desmond

May 2016

htmlabstractIn recent years, several datasets have been released that include images and text, givin...

Reuters27000

Mouriño García, M

To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...

On Chinese Wikipedia biographies

Nora Van den Bosch

March 2020

In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....

Nerwip Corpus v3 - Data

Vincent Labatut (613662)

February 2015

<p>Set of 250 biographic articles extracted from Wikipedia. Most of them are represented by 3 differ...

Nerwip Corpus

Vincent Labatut (613662)

March 2015

<p>This corpus contains 408 Wikipedia articles. Those are biographies, manually annotated to higligh...

Wikipedia Corpus

Mouriño-García, M (via Mendeley Data)

November 2017

Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...

Nerwip - Named Entity Extraction in Wikipedia Pages

Labatut, Vincent
Akbulut, Yasa
Küpelioğlu, Burcu
Atdağ, Samet

January 2011

This platform was initially designed to apply and compare Named Entity Recognition (NER) tools on co...

Title and subtitles of Wikipedia articles

Sanchez-Charles, D. (David)
CA Strategic Research

This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...

English Wikipedia

Alexander Panchenko

January 2017

This text corpus is composed of texts of English Wikipedia extracted from the Wikipedia dump of 26th...

English Wikipedia Corpus Chunks

Klubicka, Filip
Maldonado, Alfredo
Mahalunkar, Abhijit
Kelleher, John D.

January 2019

This archive contains a collection of language corpora. These are text files that contain samples of...

Wikipedia Human Medicine Corpus

Mouriño García, M (via Mendeley Data)

November 2017

Wikipedia Human Medicine Corpus is a bilingual—Spanish-English—single-label corpus composed of 2,143...

A Wikipedia dataset of 5 categories

Maitre, Julien

June 2019

A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...

BioInfer: a corpus for information extraction in the biomedical domain

Järvinen Jouni
Boberg Jorma
Björne Jari
Heimonen Juho
Ginter Filip
Pyysalo Sampo
Salakoski Tapio

February 2007

Abstract Background Lately, there has been a great interest in the application of information extrac...

Excavating the mother lode of human-generated text:a systematic review of research that uses the wikipedia corpus

Mehdi, M. (Mohamad)
Okoli, C. (Chitu)
Mesgari, M. (Mostafa)
Nielsen, F. Å. (Finn Årup)
Lanamäki, A. (Arto)

January 2017

Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...

Multilingual Medical Corpora

Fabián Villena

September 2019

The amount of digital data derived from healthcare processes have increased tremendously in the last...

A corpus of images and text in online news

Hollink, Laura
Bedjeti, Adriatik
Harmelen, M.
Elliott, Desmond

May 2016

htmlabstractIn recent years, several datasets have been released that include images and text, givin...

Reuters27000

Mouriño García, M

To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...

On Chinese Wikipedia biographies

Nora Van den Bosch

March 2020

In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....

Nerwip Corpus v3 - Data

Vincent Labatut (613662)

February 2015

<p>Set of 250 biographic articles extracted from Wikipedia. Most of them are represented by 3 differ...

Nerwip Corpus

Vincent Labatut (613662)

March 2015

<p>This corpus contains 408 Wikipedia articles. Those are biographies, manually annotated to higligh...

Wikipedia Corpus

Mouriño-García, M (via Mendeley Data)

November 2017

Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...

Nerwip - Named Entity Extraction in Wikipedia Pages

Labatut, Vincent
Akbulut, Yasa
Küpelioğlu, Burcu
Atdağ, Samet

January 2011

This platform was initially designed to apply and compare Named Entity Recognition (NER) tools on co...

Title and subtitles of Wikipedia articles

Sanchez-Charles, D. (David)
CA Strategic Research

This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the ...

English Wikipedia

Alexander Panchenko

January 2017

This text corpus is composed of texts of English Wikipedia extracted from the Wikipedia dump of 26th...

English Wikipedia Corpus Chunks

Klubicka, Filip
Maldonado, Alfredo
Mahalunkar, Abhijit
Kelleher, John D.

January 2019

This archive contains a collection of language corpora. These are text files that contain samples of...

Wikipedia Human Medicine Corpus

Mouriño García, M (via Mendeley Data)

November 2017

Wikipedia Human Medicine Corpus is a bilingual—Spanish-English—single-label corpus composed of 2,143...

A Wikipedia dataset of 5 categories

Maitre, Julien

June 2019

A subset of articles extracted from the French Wikipedia XML dump. Data published here include 5 dif...

BioInfer: a corpus for information extraction in the biomedical domain

Järvinen Jouni
Boberg Jorma
Björne Jari
Heimonen Juho
Ginter Filip
Pyysalo Sampo
Salakoski Tapio

February 2007

Abstract Background Lately, there has been a great interest in the application of information extrac...

Excavating the mother lode of human-generated text:a systematic review of research that uses the wikipedia corpus

Mehdi, M. (Mohamad)
Okoli, C. (Chitu)
Mesgari, M. (Mostafa)
Nielsen, F. Å. (Finn Årup)
Lanamäki, A. (Arto)

January 2017

Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...

Multilingual Medical Corpora

Fabián Villena

September 2019

The amount of digital data derived from healthcare processes have increased tremendously in the last...

A corpus of images and text in online news

Hollink, Laura
Bedjeti, Adriatik
Harmelen, M.
Elliott, Desmond

May 2016

htmlabstractIn recent years, several datasets have been released that include images and text, givin...

Reuters27000

Mouriño García, M

To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...

On Chinese Wikipedia biographies

Nora Van den Bosch

March 2020

In the last few months we tried to build a corpus based on the biographies of the Chinese Wikipedia....

Nerwip Corpus v4 - Data

Abstract

Extracted data

Nerwip Corpus v4 - Data

Abstract

Extracted data

Related items

Related items