Encoder-decoder transformer architectures have become popular recently with the advent of T5 models. It is also more favorable over architectures like BERT for pre-training on language model task when it comes to large scale models which could take months to train given it's generality. While being able to generalize to more tasks, it is not evident if the proposed encoder-decoder architecture is the most efficient for fine-tuning on classification and regression tasks given the pre-trained model. In this work, we study fine-tuning pre-trained encoder-decoder models such as T5. Particularly, we propose \textbf{EncT5} as a way to efficiently fine-tune pre-trained encoder-decoder T5 models for classification and regression tasks by using the ...
In recent years, many interpretability methods have been proposed to help interpret the internal sta...
Adjusting the latency, power, and accuracy of natural language understanding models is a desirable o...
Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but ma...
State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computatio...
Recent advances in Transformer-based Large Language Models have made great strides in natural langua...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (AS...
The powerful modeling capabilities of all-attention-based transformer architectures often cause over...
Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Task-conditional architecture offers advantage in parameter efficiency but falls short in performanc...
Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. How...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Auto-encoders play a fundamental role in unsupervised feature learning and learning initial paramete...
In recent years, many interpretability methods have been proposed to help interpret the internal sta...
Adjusting the latency, power, and accuracy of natural language understanding models is a desirable o...
Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but ma...
State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computatio...
Recent advances in Transformer-based Large Language Models have made great strides in natural langua...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (AS...
The powerful modeling capabilities of all-attention-based transformer architectures often cause over...
Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Task-conditional architecture offers advantage in parameter efficiency but falls short in performanc...
Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. How...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Auto-encoders play a fundamental role in unsupervised feature learning and learning initial paramete...
In recent years, many interpretability methods have been proposed to help interpret the internal sta...
Adjusting the latency, power, and accuracy of natural language understanding models is a desirable o...
Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but ma...