The Icelandic language is a morphologically complex language, for which a large tagset has been created. This paper describes the design of a linguistic rule-based system for part-of-speech tagging Icelandic text. The system contains two main components: a disambiguator, IceTagger, and an unknown word guesser, IceMorphy. IceTagger uses a small number of local elimination rules along with a global heuristics component. The heuristics guess the functional roles of the words in a sentence, mark prepositional phrases, and use the acquired knowledge to force feature agreement where appropriate. IceMorphy is used for guessing the tag profile for unknown words and for automatically filling tag profile gaps in the lexicon. Evaluation shows that Ice...
There is an increasing interest in the NLP community in developing tools for annotating historical d...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
This paper reports the ongoing work of producing a state of the art part of speech tagger for unedit...
~aturallanguageprocessing (~LP) is a very young discipline in Iceland. Therefore, there is a lack of...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
Data driven POS tagging has achieved good performance for English, but can still lag behind linguist...
Tölvunarfræði, ThesisIn this thesis, four attempts to improve the tagging accuracy for Icelandic tex...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
This paper explores the impact of inconsistencies stemming from human mistakes on the accuracy of pa...
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous natu...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
Þessi ritgerð lýsir þróun nákvæms málfræðimarkara fyrir færeysku. Til að ná slíku fram var íslenski ...
This paper reports on a work in progress. Auður Þórunn Rögnvaldsdóttir, Eiríkur Rögnvaldsson, Kristí...
There is an increasing interest in the NLP community in developing tools for annotating historical d...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
This paper reports the ongoing work of producing a state of the art part of speech tagger for unedit...
~aturallanguageprocessing (~LP) is a very young discipline in Iceland. Therefore, there is a lack of...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
Data driven POS tagging has achieved good performance for English, but can still lag behind linguist...
Tölvunarfræði, ThesisIn this thesis, four attempts to improve the tagging accuracy for Icelandic tex...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
This paper explores the impact of inconsistencies stemming from human mistakes on the accuracy of pa...
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous natu...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
Þessi ritgerð lýsir þróun nákvæms málfræðimarkara fyrir færeysku. Til að ná slíku fram var íslenski ...
This paper reports on a work in progress. Auður Þórunn Rögnvaldsdóttir, Eiríkur Rögnvaldsson, Kristí...
There is an increasing interest in the NLP community in developing tools for annotating historical d...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
This paper reports the ongoing work of producing a state of the art part of speech tagger for unedit...