While GPTs with traditional fine-tuning fail to achieve strong results on natural language understanding (NLU), we show that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning -- which employs trainable continuous prompt embeddings. On the knowledge probing (LAMA) benchmark, the best GPT recovers 64\% (P@1) of world knowledge without any additional text provided during test time, which substantially improves the previous best by 20+ percentage points. On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning. Importantly, we find that P-tuning also improves BERTs' performance in both few-shot and supervised settings w...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown imp...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to follow diverse hum...
Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for perform...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
In recent years, there has been significant progress in developing pre-trained language models for N...
Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-sh...
Pretrained language models can be effectively stimulated by textual prompts or demonstrations, espec...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown imp...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to follow diverse hum...
Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for perform...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
In recent years, there has been significant progress in developing pre-trained language models for N...
Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-sh...
Pretrained language models can be effectively stimulated by textual prompts or demonstrations, espec...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown imp...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...