These social science word embeddings in FastText have been created from 37,604 open access social science research papers from the social science access repository (https://www.gesis.org/ssoar/home). They are available in German and English. (skipgram model, n-grams with n≥3 and n≤6, different dimensions (100, 150, 200, 300, 500), five epochs, learning rate 0.05, five negative examples) Please cite: Schiffers, Ricardo, Dagmar Kern, and Daniel Hienert. 2022. "Evaluation of Word Embeddings for the Social Sciences." In Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, edited by Stefania Degaetano, Anna Kazantseva, Nils Reiter, and Stan Szpakowicz, 1-6...
Language can provide a window into individuals, families, and their community and culture, and, at t...
We identify three gaps that limit the utility and obstruct the progress of computational text analys...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
To extract essential information from complex data, computer scientists have been developing machine...
Text, the written representation of human thought and communication in natural language, has been a ...
This study investigates the distribution and coverage of words in New General Service List (NGSL) an...
Accompanying a preprint manuscript and code repository, this folder contains both raw text data and ...
Word embeddings generated with Fasttext and 1 GB of Ancient Greek texts. These embeddings were produ...
Natural language corpora are phenomenally rich resources for learning about people and society, and ...
These Spanish word embeddings in FastText have been generated from the largest corpus ever made in S...
<p>The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesa...
Spanish Clinical Word Embeddings in FastText These embeddings have been generated from the largest ...
The increasing pace of change in languages affects many applications and algorithms for text process...
The academic literature of social sciences records human civilization and studies human social probl...
International audienceIt is now commonplace to observe that we are facing a deluge of online informa...
Language can provide a window into individuals, families, and their community and culture, and, at t...
We identify three gaps that limit the utility and obstruct the progress of computational text analys...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
To extract essential information from complex data, computer scientists have been developing machine...
Text, the written representation of human thought and communication in natural language, has been a ...
This study investigates the distribution and coverage of words in New General Service List (NGSL) an...
Accompanying a preprint manuscript and code repository, this folder contains both raw text data and ...
Word embeddings generated with Fasttext and 1 GB of Ancient Greek texts. These embeddings were produ...
Natural language corpora are phenomenally rich resources for learning about people and society, and ...
These Spanish word embeddings in FastText have been generated from the largest corpus ever made in S...
<p>The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesa...
Spanish Clinical Word Embeddings in FastText These embeddings have been generated from the largest ...
The increasing pace of change in languages affects many applications and algorithms for text process...
The academic literature of social sciences records human civilization and studies human social probl...
International audienceIt is now commonplace to observe that we are facing a deluge of online informa...
Language can provide a window into individuals, families, and their community and culture, and, at t...
We identify three gaps that limit the utility and obstruct the progress of computational text analys...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...