This paper presents and evaluates a novel and flexible chunking method using Constraint Grammar (CG) rules to introduce chunk edges in corpus annotation. Our method exploits pre-existing (non-constituent) morphosyntactic annotation such as part-of-speech or function tags, but can also be made to work on raw text, integrated with other CG modules. The first version of the chunker was developed for German CG-annotated interview data, with a parallel English version derived from the German one, indicating a high degree of language-independence of the rules in the presence of generalized syntactic-functional tags (e.g. subject, object, modifier). Two different approaches are discussed, one for minimal, flat chunking, the other for deep, nested...
By parsing is here meant the automatic assignment of morphological and syntactic structure (but not ...
In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic t...
Abstract. This paper presents a solution for overcoming the lexical resource gap when mounting rule-...
One of the most common operations in language process-ing are segmentation and labelling [7]. Chunki...
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When ...
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When ...
International audienceIn this paper, we try three distinct approaches to chunk transcribed oral data...
International audienceIn this paper, we try three distinct approaches to chunk transcribed oral data...
uni-tuebingen.de This paper describes a CoNLL-style chunk representation for the Tübingen Treebank ...
International audienceWe present in this paper a syntactic annotation project relying on a linguisti...
International audienceWe present in this paper a syntactic annotation project relying on a linguisti...
The paper describes a rule-based system for tagging clause boundaries, implemented for annotating th...
This paper describes an approach to treebank development which relies on the manual development of a...
In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using ...
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctu...
By parsing is here meant the automatic assignment of morphological and syntactic structure (but not ...
In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic t...
Abstract. This paper presents a solution for overcoming the lexical resource gap when mounting rule-...
One of the most common operations in language process-ing are segmentation and labelling [7]. Chunki...
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When ...
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When ...
International audienceIn this paper, we try three distinct approaches to chunk transcribed oral data...
International audienceIn this paper, we try three distinct approaches to chunk transcribed oral data...
uni-tuebingen.de This paper describes a CoNLL-style chunk representation for the Tübingen Treebank ...
International audienceWe present in this paper a syntactic annotation project relying on a linguisti...
International audienceWe present in this paper a syntactic annotation project relying on a linguisti...
The paper describes a rule-based system for tagging clause boundaries, implemented for annotating th...
This paper describes an approach to treebank development which relies on the manual development of a...
In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using ...
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctu...
By parsing is here meant the automatic assignment of morphological and syntactic structure (but not ...
In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic t...
Abstract. This paper presents a solution for overcoming the lexical resource gap when mounting rule-...