Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our mode...
Pre-trained language models have achieved state-of-the-art accuracies on various text classification...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
The often observed unavailability of large amounts of training data typically required by deep learn...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to e...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Large pre-trained language models such as BERT have been the driving force behind recent improvement...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applica...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Most recently, there has been significant interest in learning contextual representations for variou...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-trained language models have achieved state-of-the-art accuracies on various text classification...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
The often observed unavailability of large amounts of training data typically required by deep learn...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to e...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Large pre-trained language models such as BERT have been the driving force behind recent improvement...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applica...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Most recently, there has been significant interest in learning contextual representations for variou...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-trained language models have achieved state-of-the-art accuracies on various text classification...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...