We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extractors for general audio representations. We mainly analyze the feasibility of transferring those embeddings to other tasks within the speech and sound domains. Specifically, we benchmark weakly supervised pre-trained models (MobileNetV2 and EfficientNet-B0) against modern self-supervised learning methods (BYOL-A) as feature extractors. Fourteen downstream tasks are used for evaluation ranging from music instrument classification to language classification. Our results indicate that AT pre-trained models are an excellent transfer learning choice for music, event, and emotion recognition tasks. Further, finetuning AT models can also benefit speec...
Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multi...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Within the audio research community and the industry, keyword spotting (KWS) and audio tagging (AT) ...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Automatic music tagging systems have once more gained relevance over the last years, not least throu...
Automatic music tagging systems have once more gained relevance over the last years, not least throu...
International audienceRecently, a number of semi-supervised learning (SSL) methods, in the framework...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multi...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Within the audio research community and the industry, keyword spotting (KWS) and audio tagging (AT) ...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Automatic music tagging systems have once more gained relevance over the last years, not least throu...
Automatic music tagging systems have once more gained relevance over the last years, not least throu...
International audienceRecently, a number of semi-supervised learning (SSL) methods, in the framework...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multi...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...