There is an increasing interest in the NLP community in developing tools for annotating historical data, for example, to facilitate research in the field of corpus linguistics. In this work, we experiment with several PoS taggers using a sub-corpus of the Icelandic Saga Corpus. This is carried out in three main steps. First, we evaluate taggers, which were trained on Modern Icelandic, when tagging Old Icelandic. Second, we semi-automatically correct errors in the training corpus using a bootstrapping method. Finally, we evaluate the taggers on the corrected training corpus. The best performing single tagger is Stagger, a tagger based on the averaged perceptron algorithm, obtaining an accuracy of 91.76%. By combining the output of three tagg...
Tagger accuracy deteriorates when applied to texts different from the training corpus, e.g. with res...
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous natu...
The field of Part of Speech (POS) tagging has made slow but steady progress during the last decade, ...
ABSTRACT There is an increasing interest in the NLP community in developing tools for annotating his...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
This paper explores the impact of inconsistencies stemming from human mistakes on the accuracy of pa...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
Data driven POS tagging has achieved good performance for English, but can still lag behind linguist...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
We experiment with extending the dic-tionaries used by three open-source part-of-speech taggers, by ...
The Icelandic language is a morphologically complex language, for which a large tagset has been crea...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
Tagger accuracy deteriorates when applied to texts different from the training corpus, e.g. with res...
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous natu...
The field of Part of Speech (POS) tagging has made slow but steady progress during the last decade, ...
ABSTRACT There is an increasing interest in the NLP community in developing tools for annotating his...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
This paper explores the impact of inconsistencies stemming from human mistakes on the accuracy of pa...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
Data driven POS tagging has achieved good performance for English, but can still lag behind linguist...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
We experiment with extending the dic-tionaries used by three open-source part-of-speech taggers, by ...
The Icelandic language is a morphologically complex language, for which a large tagset has been crea...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
Tagger accuracy deteriorates when applied to texts different from the training corpus, e.g. with res...
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous natu...
The field of Part of Speech (POS) tagging has made slow but steady progress during the last decade, ...