This paper discusses an automatic, data-driven approach to treebank error detection. The approach adapts the use of so-called variation n-grams as defined in Dickinson and Meurers (2003) for the detection of inconsistent part-of-speech annotations to syntactic annotation. The underlying idea is to define a consistency test for the mapping from recurring strings to their syntactic annotation. The paper illustrates with a case study based on the WSJ treebank that the method successfully detects inconsistencies in syntactic category annotation. Since such inconsistencies are typically introduced by humans, our method works best for large corpora that have been annotated manually or semi-automatically, which is generally the case for current sy...
Annotated data is an essential ingredient in natural language processing for training and evaluating...
In this thesis, we investigate methods for automatic detection, and to some extent correction, of gr...
Recent years have seen an increasing interest in developing standards for linguistic annotation, wit...
Automatic inconsistency detection in parsed corpora is significantly helpful for building more and l...
This paper describes a statistical approach to detect annotation errors in dependency treebanks. As ...
This work describes how derivation tree fragments based on a variant of Tree Adjoining Grammar (TAG)...
Thesis Abstract Akshay Aggarwal July 2020 This thesis attempts at correction of some errors and inco...
We studied the treebanks included in HamleDT and partially unified their label sets. Afterwards, we ...
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Ko...
Automatic inconsistency detection in parsed corpora is significantly helpful for building more and l...
Abstract. Treebanks play an important role in the development of var-ious natural language processin...
Annotating linguistic data is often a complex, time consuming and expensive endeavor. Even with stri...
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)In this paper, some results on the dete...
This paper describes the development of a hybrid tool for a semi-automated process for validation of...
We develop a method for detecting errors in semantic predicate-argument annotation, based on the var...
Annotated data is an essential ingredient in natural language processing for training and evaluating...
In this thesis, we investigate methods for automatic detection, and to some extent correction, of gr...
Recent years have seen an increasing interest in developing standards for linguistic annotation, wit...
Automatic inconsistency detection in parsed corpora is significantly helpful for building more and l...
This paper describes a statistical approach to detect annotation errors in dependency treebanks. As ...
This work describes how derivation tree fragments based on a variant of Tree Adjoining Grammar (TAG)...
Thesis Abstract Akshay Aggarwal July 2020 This thesis attempts at correction of some errors and inco...
We studied the treebanks included in HamleDT and partially unified their label sets. Afterwards, we ...
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Ko...
Automatic inconsistency detection in parsed corpora is significantly helpful for building more and l...
Abstract. Treebanks play an important role in the development of var-ious natural language processin...
Annotating linguistic data is often a complex, time consuming and expensive endeavor. Even with stri...
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)In this paper, some results on the dete...
This paper describes the development of a hybrid tool for a semi-automated process for validation of...
We develop a method for detecting errors in semantic predicate-argument annotation, based on the var...
Annotated data is an essential ingredient in natural language processing for training and evaluating...
In this thesis, we investigate methods for automatic detection, and to some extent correction, of gr...
Recent years have seen an increasing interest in developing standards for linguistic annotation, wit...