CrowdTruth Corpus for Open Domain Relation Extraction from Sentences

Anca Dumitrache
Lora Aroyo
Chris Welty

Publication date

October 2018

DOI

Abstract

This repository contains a ground truth corpus for open domain relation extraction from sentences, acquired with crowdsourcing and processed with CrowdTruth metrics that capture ambiguity in annotations by measuring inter-annotator disagreement. The dataset contains annotations for 4,100 sentences sampled from Angeli et al. (1) and Riedel et al. (2), over 16 relations, with each sentence annotated by 15 workers. The sentences have been pre-processed with Distant Supervision (3) using the Freebase knowledge base, in order to identify the term pairs in each sentence that are likely to express a relation. The crowdsourced data was collected from Figure Eight and Amazon Mechanical Turk. This corpus has been discussed in the following papers: ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

CrowdTruth Corpus for Open Domain Relation Extraction from Sentences

Abstract

Extracted data

CrowdTruth Corpus for Open Domain Relation Extraction from Sentences

Abstract

Extracted data

Related items

Related items