We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous naturally occurring conversa-tions, 31 conversations in all. The corpus comprises 184,080 tokens, 14,297 types and 9,221 lemmas. It has been transcribed using standard orthography. We present a list of the 30 most common lemmas in the corpus and compare it to a list of the most frequent lemmas in the written language, concluding that the differences between the two lists are smaller than expected. We have tagged the corpus morphologically with a statistical tagger that had been trained on written texts. The results are much better than we expected, and the tagging accuracy is as least as high as for the written texts. The final part of the paper ...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
The paper describes the new Corpus of Spoken Faroese. While the corpus is still under ...
This dataset consists of four main resources: a concatenated dictionary of Old Icelandic parsed for ...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
~aturallanguageprocessing (~LP) is a very young discipline in Iceland. Therefore, there is a lack of...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to mac...
We describe the background for and building of IcePaHC, a one million word parsed historical corpus ...
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to mac...
The Icelandic language is a morphologically complex language, for which a large tagset has been crea...
We introduce an Icelandic corpus of more than 250 million running words and de-scribe the methodolog...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
The paper describes the new Corpus of Spoken Faroese. While the corpus is still under development wi...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
The paper describes the new Corpus of Spoken Faroese. While the corpus is still under ...
This dataset consists of four main resources: a concatenated dictionary of Old Icelandic parsed for ...
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for t...
~aturallanguageprocessing (~LP) is a very young discipline in Iceland. Therefore, there is a lack of...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different ta...
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to mac...
We describe the background for and building of IcePaHC, a one million word parsed historical corpus ...
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to mac...
The Icelandic language is a morphologically complex language, for which a large tagset has been crea...
We introduce an Icelandic corpus of more than 250 million running words and de-scribe the methodolog...
In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Percep...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about ...
The paper describes the new Corpus of Spoken Faroese. While the corpus is still under development wi...
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Percep...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
The paper describes the new Corpus of Spoken Faroese. While the corpus is still under ...
This dataset consists of four main resources: a concatenated dictionary of Old Icelandic parsed for ...