This paper explores the relationship between the tagset design and linguistic properties of inflected languages for the task of morphosyntactic tagging. Some information theoretic measures and statistics on these languages are reported which show, unsurprisingly, that the tagsets for morphologically rich languages are larger than tagsets for English and the average tag/token ambiguity is higher. The surprising outcome of the experiments is that for Catalan, Czech, Polish, Portuguese,and Russian – which are considered to be “word order” free languages (to various degrees) – the knowledge about the preceding tag reduces the uncertainty about the tag in question if the detailed tagset is used, but when the tagset is reduced to the size of the ...
The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with a large numbe...
Statistical n-gram taggers like that of [Church 1988] or [Foster 1991] assign a part-of-speech label...
We present results of an experiment dealing with combining outputs of five part-ofspeech taggers v...
Anna Feldman is an assistant professor of linguistics and computer science. Her interests are corpus...
This work presents a part of a more global study on the problem of parsing of Czech and on the knowl...
This paper presents an original approach to part-of-speech tagging of fine-grained features (such as...
This paper presents a new part-ofspeech tagger that takes into account both linguistic knowledge and...
This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic ...
Abstract. Comparative studies in theoretical linguistics and the production of bi- and multilingual ...
In this paper Brill's rule-based PoS tagger is tested and adapted for Hungarian. It is shown th...
The paper presents one way of reconciling data sparseness with the requirement of high accuracy tagg...
The challenge of POS tagging and lemmatization in morphologically rich languages is examined by comp...
In this paper, we investigate automatic tagging of French corpora and compare morpho-syntactic prope...
Many of the methods developed for Western European languages and used widespread to produce annotate...
This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenome...
The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with a large numbe...
Statistical n-gram taggers like that of [Church 1988] or [Foster 1991] assign a part-of-speech label...
We present results of an experiment dealing with combining outputs of five part-ofspeech taggers v...
Anna Feldman is an assistant professor of linguistics and computer science. Her interests are corpus...
This work presents a part of a more global study on the problem of parsing of Czech and on the knowl...
This paper presents an original approach to part-of-speech tagging of fine-grained features (such as...
This paper presents a new part-ofspeech tagger that takes into account both linguistic knowledge and...
This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic ...
Abstract. Comparative studies in theoretical linguistics and the production of bi- and multilingual ...
In this paper Brill's rule-based PoS tagger is tested and adapted for Hungarian. It is shown th...
The paper presents one way of reconciling data sparseness with the requirement of high accuracy tagg...
The challenge of POS tagging and lemmatization in morphologically rich languages is examined by comp...
In this paper, we investigate automatic tagging of French corpora and compare morpho-syntactic prope...
Many of the methods developed for Western European languages and used widespread to produce annotate...
This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenome...
The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with a large numbe...
Statistical n-gram taggers like that of [Church 1988] or [Foster 1991] assign a part-of-speech label...
We present results of an experiment dealing with combining outputs of five part-ofspeech taggers v...