Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form word the tagger tries to find the sequence of morphemes that is most likely to generate that word. From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word
This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology ...
This paper describes a system for the unsupervised learning of morpho-logical suffixes and stems fro...
Abstract: Lemmatisation is the process of finding the normalised forms of words appearing in text. I...
The challenge of POS tagging and lemmatization in morphologically rich languages is examined by comp...
paderborn.de 1 This paper was published in the Proceedings of the COLING-ACL 1998. In this paper we ...
We present LEMMING, a modular log-linear model that jointly models lemmati-zation and tagging and su...
This paper presents an integrated tool for German morphology and statistical part-of-speech tagging ...
We present a method for probabilistic parsing of German words. Our approach uses a mor-phological an...
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological...
With written Swiss German becoming more popular in everyday use, it has become a target for text pro...
Computational morphology is a core component in many different types of natural language processing,...
We present a novel method of statisti-cal morphological generation, i.e. the pre-diction of inflecte...
The file presents the words used in the Pennsylvania German part of the ENDE corpus (www.deitsch.eu)...
We present a novel corpus-based approach to lemmatization of unknown words. The tool learns affix pa...
A core issue that hampers development and use of language technology for underresourced and morpholo...
This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology ...
This paper describes a system for the unsupervised learning of morpho-logical suffixes and stems fro...
Abstract: Lemmatisation is the process of finding the normalised forms of words appearing in text. I...
The challenge of POS tagging and lemmatization in morphologically rich languages is examined by comp...
paderborn.de 1 This paper was published in the Proceedings of the COLING-ACL 1998. In this paper we ...
We present LEMMING, a modular log-linear model that jointly models lemmati-zation and tagging and su...
This paper presents an integrated tool for German morphology and statistical part-of-speech tagging ...
We present a method for probabilistic parsing of German words. Our approach uses a mor-phological an...
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological...
With written Swiss German becoming more popular in everyday use, it has become a target for text pro...
Computational morphology is a core component in many different types of natural language processing,...
We present a novel method of statisti-cal morphological generation, i.e. the pre-diction of inflecte...
The file presents the words used in the Pennsylvania German part of the ENDE corpus (www.deitsch.eu)...
We present a novel corpus-based approach to lemmatization of unknown words. The tool learns affix pa...
A core issue that hampers development and use of language technology for underresourced and morpholo...
This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology ...
This paper describes a system for the unsupervised learning of morpho-logical suffixes and stems fro...
Abstract: Lemmatisation is the process of finding the normalised forms of words appearing in text. I...