We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements. Empirically...
Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for perform...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Curriculum learning strategies in prior multi-task learning approaches arrange datasets in a difficu...
International audienceLarge language models have recently been shown to attain reasonable zero-shot ...
Nowadays, owing to the superior capacity of the large pre-trained language models (PLM), the PLM-bas...
Pretrained language models have shown success in various areas of natural language processing, inclu...
One of the most impressive results of recent NLP history is the ability of pre-trained language mode...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language proce...
Enhancing the zero-shot performance of instruction-following models requires heavy computation, eith...
Black-Box Tuning (BBT) is a derivative-free approach to optimize continuous prompt tokens prepended ...
Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), ena...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the...
While developments in machine learning led to impressive performance gains on big data, many human s...
In this paper, we consider the framework of multi-task representation (MTR) learning where the goal ...
Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for perform...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Curriculum learning strategies in prior multi-task learning approaches arrange datasets in a difficu...
International audienceLarge language models have recently been shown to attain reasonable zero-shot ...
Nowadays, owing to the superior capacity of the large pre-trained language models (PLM), the PLM-bas...
Pretrained language models have shown success in various areas of natural language processing, inclu...
One of the most impressive results of recent NLP history is the ability of pre-trained language mode...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language proce...
Enhancing the zero-shot performance of instruction-following models requires heavy computation, eith...
Black-Box Tuning (BBT) is a derivative-free approach to optimize continuous prompt tokens prepended ...
Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), ena...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the...
While developments in machine learning led to impressive performance gains on big data, many human s...
In this paper, we consider the framework of multi-task representation (MTR) learning where the goal ...
Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for perform...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Curriculum learning strategies in prior multi-task learning approaches arrange datasets in a difficu...