The Talko corpus of Swedish spoken in Finland is a new research tool consisting of audio files linked to annotation, i.e., transcriptions on two parallel levels and part-of-speech tagging. The corpus is searchable through a web-based interface. The recordings were made in 2005–2008 in all parts of Swedish-language Finland. They have been transcribed in a broad phonetic transcription as well as in a standard orthographic transcription. The part-of-speech tagging is done with TreeTagger, trained on the Stockholm-Umeå Corpus of written Swedish. The automatically produced part-of-speech tags are manually corrected for subsets of the data, and the manually corrected data are subsequently added to the training data. This will gradually im...
Nonstandard dialects, characterized by atypical lexical items, pronunciation and grammar, often degr...
Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerk...
A searchable database of speech samples from more than 100 Swedish dialects is being established for...
In this paper, we describe the Nordic Dialect Corpus, which has recently been completed. The corpus ...
This paper summarizes work on spoken language at the Department of Linguistics Göteborg University. ...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
The paper describes the first part of the Nordic Dialect Corpus. This is a tool that combines a numb...
Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first cor...
Funding Information: This work was partly funded by Academy of Finland (Grant Numbers 337073, 329267...
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Average...
Abstract. This paper describes the Nordic Dialect Corpus, a corpus that consists of transcribed spok...
This paper reports on two experiments with a probabilistic part-of-speech tagger, trained on a tagge...
This research project is a sociolinguistic investigation into change and variation in Finland-Swedis...
This research database consists of recordings of a little more than 1300 speakers representing 107 S...
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Average...
Nonstandard dialects, characterized by atypical lexical items, pronunciation and grammar, often degr...
Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerk...
A searchable database of speech samples from more than 100 Swedish dialects is being established for...
In this paper, we describe the Nordic Dialect Corpus, which has recently been completed. The corpus ...
This paper summarizes work on spoken language at the Department of Linguistics Göteborg University. ...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
The paper describes the first part of the Nordic Dialect Corpus. This is a tool that combines a numb...
Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first cor...
Funding Information: This work was partly funded by Academy of Finland (Grant Numbers 337073, 329267...
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Average...
Abstract. This paper describes the Nordic Dialect Corpus, a corpus that consists of transcribed spok...
This paper reports on two experiments with a probabilistic part-of-speech tagger, trained on a tagge...
This research project is a sociolinguistic investigation into change and variation in Finland-Swedis...
This research database consists of recordings of a little more than 1300 speakers representing 107 S...
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Average...
Nonstandard dialects, characterized by atypical lexical items, pronunciation and grammar, often degr...
Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerk...
A searchable database of speech samples from more than 100 Swedish dialects is being established for...