Towards a Standard Dataset of Swedish Word Vectors

Fallgren, Per
Segeblad, Jesper
Kuhlmann, Marco

Publication date

January 2016

Publisher

Linköpings universitet, Tekniska fakulteten

Abstract

Word vectors, embeddings of words into a low-dimensional space, have been shown to be useful for a large number of natural language processing tasks. Our goal with this paper is to provide a useful dataset of such vectors for Swedish. To this end, we investigate three standard embedding methods: the continuous bag-of-words and the skip-gram model with negative sampling of Mikolov et al. (2013a), and the global vectors of Pennington et al. (2014). We compare these methods using QVEC-CCA (Tsvetkov et al., 2016), an intrinsic evaluation measure that quantifies the correlation of learned word vectors with external linguistic resources. For this propose we use SALDO, the Swedish Association Lexicon (Borin et al., 2013). Our experiments show that...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Towards a Standard Dataset of Swedish Word Vectors

Abstract

Extracted data

Towards a Standard Dataset of Swedish Word Vectors

Abstract

Extracted data

Related items

Related items