Large pre-trained language models such as BERT have been the driving force behind recent improvements across many NLP tasks. However, BERT is only trained to predict missing words – either through masking or next sentence prediction – and has no knowledge of lexical, syntactic or semantic information beyond what it picks up through unsupervised pre-training. We propose a novel method to explicitly inject linguistic information in the form of word embeddings into any layer of a pre-trained BERT. When injecting counter-fitted and dependency-based embeddings, the performance improvements on multiple semantic similarity datasets indicate that such information is beneficial and currently missing from the original model. Our qualitative analysis ...
International audienceWord vector representations play a fundamental role in many NLP applications. ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Several studies investigated the linguistic information implicitly encoded in Neural Language Models...
Pretraining deep language models has led to large performance gains in NLP. Despite this success, Sc...
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applica...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We tackle the problem of identifying metaphors in text, treated as a sequence tagging task. The pre-...
Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen...
Semantic similarity detection is a fundamental task in natural language understanding. Adding topic ...
One of the most remarkable properties of word embeddings is the fact that they capture certain types...
Modern text classification models are susceptible to adversarial examples, perturbed versions of the...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
BERT has achieved impressive performance in several NLP tasks. However, there has been limited inves...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
International audienceWord vector representations play a fundamental role in many NLP applications. ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Several studies investigated the linguistic information implicitly encoded in Neural Language Models...
Pretraining deep language models has led to large performance gains in NLP. Despite this success, Sc...
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applica...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We tackle the problem of identifying metaphors in text, treated as a sequence tagging task. The pre-...
Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen...
Semantic similarity detection is a fundamental task in natural language understanding. Adding topic ...
One of the most remarkable properties of word embeddings is the fact that they capture certain types...
Modern text classification models are susceptible to adversarial examples, perturbed versions of the...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
BERT has achieved impressive performance in several NLP tasks. However, there has been limited inves...
Pretraining deep neural network architectures with a language modeling objective has brought large i...
International audienceWord vector representations play a fundamental role in many NLP applications. ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Several studies investigated the linguistic information implicitly encoded in Neural Language Models...