Several Corpus Linguistics research groups have gone beyond collation of 'raw' text, to syntactic annotation of the text. However, linguists developing these linguistic resources have used quite different wordtagging and parse-tree labelling schemes in each of these annotated corpora. This restricts the accessibility of each corpus, making it impossible for speech and handwriting researchers to collate them into a single very large training set. This is particularly problematic as there is evidence that one of these parsed corpora on its own is too small for a general statistical model of grammatical structure, but the combined size of all the above annotated corpora should deliver a much more reliable model. We are developing a set of mapp...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
Hand crafted annotated corpora are acknowledged as critical elements for the Human Language Technolo...
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resource...
Corpus linguistic and language technological research needs empirical corpus data with nearly correc...
Corpus resources for Linguistics and NLP research on discourse phenomena, such as coreference and di...
Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of...
The creation and analysis of corpus linguistic resources can be a costly and error-prone process. Ap...
Corpus-based Machine Learning of linguistic annotations has been a key topic for all areas of Natura...
There is a need for a general framework for linguistic annotation that is flexible and extensible en...
Abstract. There is a need for a general framework for linguistic annotation that is flexible and ext...
Linguistic annotation adds valuable information to a corpus. Annotated corpora are highly useful for...
A corpus without annotations can exist, but its usefulness might be so limited that, for most contem...
This book re-examines the notion of word associations, more precisely collocations. It attempts to c...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
Hand crafted annotated corpora are acknowledged as critical elements for the Human Language Technolo...
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resource...
Corpus linguistic and language technological research needs empirical corpus data with nearly correc...
Corpus resources for Linguistics and NLP research on discourse phenomena, such as coreference and di...
Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of...
The creation and analysis of corpus linguistic resources can be a costly and error-prone process. Ap...
Corpus-based Machine Learning of linguistic annotations has been a key topic for all areas of Natura...
There is a need for a general framework for linguistic annotation that is flexible and extensible en...
Abstract. There is a need for a general framework for linguistic annotation that is flexible and ext...
Linguistic annotation adds valuable information to a corpus. Annotated corpora are highly useful for...
A corpus without annotations can exist, but its usefulness might be so limited that, for most contem...
This book re-examines the notion of word associations, more precisely collocations. It attempts to c...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
Hand crafted annotated corpora are acknowledged as critical elements for the Human Language Technolo...
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resource...