Pre-training complex language models is essential for the success of the recent methods such as BERT or OpenAI GPT. Their size makes not only the pre-training phase, but also consecutive applications to be computationally expensive. BERT-like models excel at token-level tasks as they provide reliable token embeddings, but they fall short when it comes to sentence or higher-level structure embeddings. The reason is that these models do not have a built-in mechanism that explicitly provides such representations. We introduce Light and Multigranural BERT that has similar complexity to BERT in the number of parameters, but is about 3 times faster by modifying the input representation, which consequently introduces changes to the attention mecha...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained language models have been dominating the field of natural language processing in recent ...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
peer reviewedDespite the widespread use of pre-trained models in NLP, well-performing pre-trained mo...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Currently, the most widespread neural network architecture for training language models is the so-ca...
Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have ...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Transformer-based language models have become a key building block for natural language processing. ...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained language models have been dominating the field of natural language processing in recent ...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
peer reviewedDespite the widespread use of pre-trained models in NLP, well-performing pre-trained mo...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Currently, the most widespread neural network architecture for training language models is the so-ca...
Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have ...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Transformer-based language models have become a key building block for natural language processing. ...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...