Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Pre-trained speech Transformers have facilitated great success across various speech processing task...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of buildin...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in su...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A p...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Pre-trained speech Transformers have facilitated great success across various speech processing task...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of buildin...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in su...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A p...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Pre-trained speech Transformers have facilitated great success across various speech processing task...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...