Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To eval...
For low resource NLP tasks like Keyword Search and domain adaptation with small amounts of in-domain...
There are two main types of word repre-sentations: low-dimensional embeddings and high-dimensional d...
State-of-the-art pre-trained language models have been shown to memorise facts and per- form well wi...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Pretraining deep language models has led to large performance gains in NLP. Despite this success, Sc...
Learning high-quality embeddings for rare words is a hard problem because of sparse context informat...
Word embeddings are a key component of high-performing natural language processing (NLP) systems, bu...
Word embeddings are a key component of high-performing natural language processing (NLP) systems, bu...
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing word...
Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that ef...
Large pre-trained language models such as BERT have been the driving force behind recent improvement...
International audienceState-of-the-art NLP systems represent inputs with word embeddings, but these ...
Real-valued word embeddings have transformed natural language processing (NLP) applications, recogni...
Word embedding is a feature learning technique which aims at mapping words from a vocabulary into ve...
For low resource NLP tasks like Keyword Search and domain adaptation with small amounts of in-domain...
There are two main types of word repre-sentations: low-dimensional embeddings and high-dimensional d...
State-of-the-art pre-trained language models have been shown to memorise facts and per- form well wi...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
Pretraining deep language models has led to large performance gains in NLP. Despite this success, Sc...
Learning high-quality embeddings for rare words is a hard problem because of sparse context informat...
Word embeddings are a key component of high-performing natural language processing (NLP) systems, bu...
Word embeddings are a key component of high-performing natural language processing (NLP) systems, bu...
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing word...
Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that ef...
Large pre-trained language models such as BERT have been the driving force behind recent improvement...
International audienceState-of-the-art NLP systems represent inputs with word embeddings, but these ...
Real-valued word embeddings have transformed natural language processing (NLP) applications, recogni...
Word embedding is a feature learning technique which aims at mapping words from a vocabulary into ve...
For low resource NLP tasks like Keyword Search and domain adaptation with small amounts of in-domain...
There are two main types of word repre-sentations: low-dimensional embeddings and high-dimensional d...
State-of-the-art pre-trained language models have been shown to memorise facts and per- form well wi...