Pre-trained transformer is a class of neural networks behind many recent natural language processing systems. Its success is often attributed to linguistic knowledge injected during the pre-training process. In this work, we make multiple attempts to surgically remove language specific knowledge from BERT. Surprisingly, these interventions often do little damage to BERT's performance on GLUE tasks. By contrasting against non-pre-trained transformers with oracle initialization, we argue that when it comes to explain BERT's working, there is a sizable void below linguistic probing and above model initialization
Pretrained transformer-based language models achieve state-of-the-art performance in many NLP tasks,...
We evaluate three simple, normalization-centric changes to improve Transformer training. First, we s...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Recently, the development of pre-trained language models has brought natural language processing (NL...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
In the published reviews of natural language pre-training technology, most literatures only elaborat...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
One of the major challenges in sign language translation from a sign language to a spoken language i...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge t...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Transformers have been established as one of the most effective neural approach in performing variou...
Pre-trained transformers have rapidly become very popular in the Natural Language Processing (NLP) c...
Pretrained transformer-based language models achieve state-of-the-art performance in many NLP tasks,...
We evaluate three simple, normalization-centric changes to improve Transformer training. First, we s...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Recently, the development of pre-trained language models has brought natural language processing (NL...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
In the published reviews of natural language pre-training technology, most literatures only elaborat...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
One of the major challenges in sign language translation from a sign language to a spoken language i...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge t...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Transformers have been established as one of the most effective neural approach in performing variou...
Pre-trained transformers have rapidly become very popular in the Natural Language Processing (NLP) c...
Pretrained transformer-based language models achieve state-of-the-art performance in many NLP tasks,...
We evaluate three simple, normalization-centric changes to improve Transformer training. First, we s...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...