The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive models entails an enormous amount of time and cost, along with a huge carbon footprint. To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain. We also propose an integrated parameter-efficient tuning (IPET) framework by aggregating the embedding prompt (a prompt-based learning app...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of ...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. How...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Despite significant advancements in deep learning for vision and natural language, unsupervised doma...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of ...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. How...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Despite significant advancements in deep learning for vision and natural language, unsupervised doma...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...