The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMA...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging ...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Can we leverage the audiovisual information already present in video to improve self-supervised repr...
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio da...
Self-supervised language models are very effective at predicting high-level cortical responses durin...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representation...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content,...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we in...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging ...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Can we leverage the audiovisual information already present in video to improve self-supervised repr...
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio da...
Self-supervised language models are very effective at predicting high-level cortical responses durin...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representation...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content,...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we in...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging ...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...