Continual pretraining is a standard way of building a domain-specific pretrained language model from a general-domain language model. However, sequential task training may cause catastrophic forgetting, which affects the model performance in downstream tasks. In this paper, we propose a continual pretraining method for the BERT-based model, named CBEAF-Adapting (Chinese Biomedical Enhanced Attention-FFN Adapting). Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Using the Chinese biomedical domain as a running example, we trained a domain-specific language model named CBEAF-RoBERTa. We conduct experiments by applying models to downstream tasks. The re...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...
Neural network training has been shown to be advantageous in many natural language processing appli...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
As a task requiring strong professional experience as supports, predictive biomedical intelligence c...
As a task requiring strong professional experience as supports, predictive biomedical intelligence c...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Pretrained language models (PLMs) are today the primary model for natural language processing. Despi...
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominen...
Biomedical text mining is becoming increasingly important as the number of biomedical documents and ...
Pre-trained models are nowadays a fundamental component of machine learning research. In continual l...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...
Neural network training has been shown to be advantageous in many natural language processing appli...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
As a task requiring strong professional experience as supports, predictive biomedical intelligence c...
As a task requiring strong professional experience as supports, predictive biomedical intelligence c...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Pretrained language models (PLMs) are today the primary model for natural language processing. Despi...
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominen...
Biomedical text mining is becoming increasingly important as the number of biomedical documents and ...
Pre-trained models are nowadays a fundamental component of machine learning research. In continual l...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...
Neural network training has been shown to be advantageous in many natural language processing appli...
International audienceBERT models used in specialized domains all seem to be the result of a simple ...