Accompanying a preprint manuscript and code repository, this folder contains both raw text data and learnt word embeddings. The data source is the set of MEDLINE articles published on or after 2000. Preprocessing consists of extraction of each article's title and abstract and some minor text processing. The result is a corpus of 10.5 million documents in a single 14 GB file. word2vec and fastText are used to learn word embeddings on this corpus and three sets of word embeddings are shared here: 1) word2vec skip-gram, 2) word2vec CBOW, and 3) fastText skip-gram. All three sets use the default parameters of the software (e.g. context=5) with the exception of hierarchical softmax optimization and dimension=200. Preprint manuscript: https://...
The Chilean Waiting List Corpus Embeddings is a Word2Vec word embedding trained over 11 million unst...
International audienceRecent studies in the biomedical domain suggest that learning statistical word...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
<p>Accompanying a preprint manuscript and code repository, this folder contains both raw text data a...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that includes: (a)...
Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, se...
Due to the recent advances in unsupervised language processing methods, it’s now possible to use lar...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
International audienceDigital representation of text documents is a crucial task in machine learning...
We developed a software package called marea (marea adamantly resists egregious acronyms) that imple...
Spanish Clinical Word Embeddings in FastText These embeddings have been generated from the largest ...
Spanish Clinical Sub-word Embeddings in FastText These embeddings have been generated from the larg...
Spanish Biomedical Word Embeddings in FastText These word embeddings have been generated from the l...
This paper introduces a novel collection of word embeddings, numerical representations of lexical se...
Word embedding for software engineering that pre-trained with the Word2Vec skip-gram algorithm. The ...
The Chilean Waiting List Corpus Embeddings is a Word2Vec word embedding trained over 11 million unst...
International audienceRecent studies in the biomedical domain suggest that learning statistical word...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
<p>Accompanying a preprint manuscript and code repository, this folder contains both raw text data a...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that includes: (a)...
Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, se...
Due to the recent advances in unsupervised language processing methods, it’s now possible to use lar...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...
International audienceDigital representation of text documents is a crucial task in machine learning...
We developed a software package called marea (marea adamantly resists egregious acronyms) that imple...
Spanish Clinical Word Embeddings in FastText These embeddings have been generated from the largest ...
Spanish Clinical Sub-word Embeddings in FastText These embeddings have been generated from the larg...
Spanish Biomedical Word Embeddings in FastText These word embeddings have been generated from the l...
This paper introduces a novel collection of word embeddings, numerical representations of lexical se...
Word embedding for software engineering that pre-trained with the Word2Vec skip-gram algorithm. The ...
The Chilean Waiting List Corpus Embeddings is a Word2Vec word embedding trained over 11 million unst...
International audienceRecent studies in the biomedical domain suggest that learning statistical word...
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) ...