This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1.1B tokens and an EHR corpus of 95M tokens. We compared them against general-domain and other domain-specific models for Spanish on three clinical NER tasks. As main results, our models are superior across the NER tasks, rendering them more convenient for clinical NLP applications. Furthermore, our findings indicate that when enough data is available, pre-training from scratch is better than continual pre-training when tested on clinical tasks, raising an exciting research question about which approach is optimal. Our models and fine-tuning scripts are publicly available at HuggingFace an...
Named Entity Recognition in the clinical domain and in languages different from English has the diff...
While deep learning techniques have shown promising results in many natural language processing (NLP...
BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as trans...
The overwhelming amount of biomedical scientific texts calls for the development of effective langu...
The largest Spanish biomedical and heath corpus to date gathered from a massive Spanish health domai...
Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervise...
Availability of data and materials: Pre-trained weights of BioALBERT models together with the datase...
Background The volume of biomedical literature and clinical data is growing at an exponential rate....
Abstract There is an increasing interest in developing artificial intelligence (AI) systems to proce...
Word embeddings are representations of words in a dense vector space. Although they are not recent p...
Automatic clinical coding is an essential task in the process of extracting relevant information fro...
As opposed to general English, many concepts in biomedical terminology have been designed in recent ...
Embeddings This repository contains the word embeddings generated from biomedical Spanish texts corp...
El reconocimiento de entidades con nombre (NER) es una tarea importante en el campo del Procesamien...
Lentzen M, Madan S, Lage-Rupprecht V, et al. Critical assessment of transformer-based AI models for ...
Named Entity Recognition in the clinical domain and in languages different from English has the diff...
While deep learning techniques have shown promising results in many natural language processing (NLP...
BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as trans...
The overwhelming amount of biomedical scientific texts calls for the development of effective langu...
The largest Spanish biomedical and heath corpus to date gathered from a massive Spanish health domai...
Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervise...
Availability of data and materials: Pre-trained weights of BioALBERT models together with the datase...
Background The volume of biomedical literature and clinical data is growing at an exponential rate....
Abstract There is an increasing interest in developing artificial intelligence (AI) systems to proce...
Word embeddings are representations of words in a dense vector space. Although they are not recent p...
Automatic clinical coding is an essential task in the process of extracting relevant information fro...
As opposed to general English, many concepts in biomedical terminology have been designed in recent ...
Embeddings This repository contains the word embeddings generated from biomedical Spanish texts corp...
El reconocimiento de entidades con nombre (NER) es una tarea importante en el campo del Procesamien...
Lentzen M, Madan S, Lage-Rupprecht V, et al. Critical assessment of transformer-based AI models for ...
Named Entity Recognition in the clinical domain and in languages different from English has the diff...
While deep learning techniques have shown promising results in many natural language processing (NLP...
BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as trans...