In text processing, deep neural networks mostly use word embeddings as an input. Embeddings have to ensure that relations between words are reflected through distances in a high-dimensional numeric space. To compare the quality of different text embeddings, typically, we use benchmark datasets. We present a collection of such datasets for the word analogy task in nine languages: Croatian, English, Estonian, Finnish, Latvian, Lithuanian, Russian, Slovenian, and Swedish. We designed the monolingual analogy task to be much more culturally independent and also constructed cross-lingual analogy datasets for the involved languages. We present basic statistics of the created datasets and their initial evaluation using fastText embeddings
International audienceIn this paper we discuss the well-known claim that language analogies yield al...
Word embeddings are real-valued word representations capable of capturing lexical semantics and trai...
We introduce a new measure of distance between languages based on word embedding, called word embedd...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
Representation of words coming from vocabulary of a language as real vectors in a high dimensional s...
We present a method to automatically generate syntactic analogy datasets for the evaluation of word ...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
Language encoders encode words and phrases in ways that capture their local semantic relatedness, bu...
Recent results show that deep neural networks using contextual embeddings significantly outperform n...
Swahili Analogy dataset contains pairs of words that are organized in 4's to facilitate word analogy...
In recent years, the use of deep neural networks and dense vector embeddings for text representation...
Over the past decade, analogies, in the form of word-level analogies, have played a significant role...
In this work, we create and make available two benchmark data sets for evaluating models of semantic...
International audienceWord embeddings intervene in a wide range of natural language processing tasks...
International audienceIn this paper we discuss the well-known claim that language analogies yield al...
Word embeddings are real-valued word representations capable of capturing lexical semantics and trai...
We introduce a new measure of distance between languages based on word embedding, called word embedd...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
Representation of words coming from vocabulary of a language as real vectors in a high dimensional s...
We present a method to automatically generate syntactic analogy datasets for the evaluation of word ...
How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multi...
Language encoders encode words and phrases in ways that capture their local semantic relatedness, bu...
Recent results show that deep neural networks using contextual embeddings significantly outperform n...
Swahili Analogy dataset contains pairs of words that are organized in 4's to facilitate word analogy...
In recent years, the use of deep neural networks and dense vector embeddings for text representation...
Over the past decade, analogies, in the form of word-level analogies, have played a significant role...
In this work, we create and make available two benchmark data sets for evaluating models of semantic...
International audienceWord embeddings intervene in a wide range of natural language processing tasks...
International audienceIn this paper we discuss the well-known claim that language analogies yield al...
Word embeddings are real-valued word representations capable of capturing lexical semantics and trai...
We introduce a new measure of distance between languages based on word embedding, called word embedd...