Word vectors, embeddings of words into a low-dimensional space, have been shown to be useful for a large number of natural language processing tasks. Our goal with this paper is to provide a useful dataset of such vectors for Swedish. To this end, we investigate three standard embedding methods: the continuous bag-of-words and the skip-gram model with negative sampling of Mikolov et al. (2013a), and the global vectors of Pennington et al. (2014). We compare these methods using QVEC-CCA (Tsvetkov et al., 2016), an intrinsic evaluation measure that quantifies the correlation of learned word vectors with external linguistic resources. For this propose we use SALDO, the Swedish Association Lexicon (Borin et al., 2013). Our experiments show that...
The digital era floods us with an excessive amount of text data. To make sense of such data automati...
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality ...
Recent works on word representations mostly rely on predictive models. Distributed word representati...
This thesis is a proof-of-concept for embedding Swedish documents using continuous vectors. These ve...
International audienceWe apply real-valued word vectors combined with two different types of classif...
This work highlights some important factors for consideration when developing word vector representa...
Word vector representation is widely used in natural language processing tasks. Most word vectors ar...
We propose two novel model architectures for computing continuous vector representations of words fr...
Unsupervised word embedding methods are frequently used for natural language processing applications...
The research topic studied in this dissertation is word representation learning, which aims to learn...
Recent methods for learning vector space representations of words have succeeded in capturing fine-g...
Recent methods for learning vector space representations of words have succeeded in capturing fine-g...
In this work, we create and make available two benchmark data sets for evaluating models of semantic...
Methods for learning vector space representations of words have yielded spaces which contain semanti...
In this thesis, we aim to explore the combination of different lexical normalization methods and pro...
The digital era floods us with an excessive amount of text data. To make sense of such data automati...
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality ...
Recent works on word representations mostly rely on predictive models. Distributed word representati...
This thesis is a proof-of-concept for embedding Swedish documents using continuous vectors. These ve...
International audienceWe apply real-valued word vectors combined with two different types of classif...
This work highlights some important factors for consideration when developing word vector representa...
Word vector representation is widely used in natural language processing tasks. Most word vectors ar...
We propose two novel model architectures for computing continuous vector representations of words fr...
Unsupervised word embedding methods are frequently used for natural language processing applications...
The research topic studied in this dissertation is word representation learning, which aims to learn...
Recent methods for learning vector space representations of words have succeeded in capturing fine-g...
Recent methods for learning vector space representations of words have succeeded in capturing fine-g...
In this work, we create and make available two benchmark data sets for evaluating models of semantic...
Methods for learning vector space representations of words have yielded spaces which contain semanti...
In this thesis, we aim to explore the combination of different lexical normalization methods and pro...
The digital era floods us with an excessive amount of text data. To make sense of such data automati...
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality ...
Recent works on word representations mostly rely on predictive models. Distributed word representati...