In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representations), a self-supervised pre-training approach for learning general-purpose audio representations. Our system is based on clustering: it utilizes an offline clustering step to produce pseudo-labels and trains the network with a classification loss supervised by these pseudo-labels. We develop on top of recent advances in self-supervised learning for computer vision and design a lightweight, easy-to-use, self-supervised pre-training scheme for learning audio representations. We pre-train DECAR embeddings on a balanced subset of the large-scale AudioSet dataset and FSD50K and evaluate our representations on the LAPE Benchmark consisting of 11 down...
Learning rich visual representations using contrastive self-supervised learning has been extremely s...
Self-supervised representation learning methods aim to provide powerful deep feature learning withou...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio da...
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we in...
Although supervised deep learning has revolutionized speech and audio processing, it has necessitate...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
Self-supervised learning technique is an under-explored topic for music audio due to the challenge o...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
In this paper, we work on a sound recognition system that continually incorporates new sound classes...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
The goal of universal audio representation learning is to obtain foundational models that can be use...
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a ...
Learning rich visual representations using contrastive self-supervised learning has been extremely s...
Self-supervised representation learning methods aim to provide powerful deep feature learning withou...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio da...
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we in...
Although supervised deep learning has revolutionized speech and audio processing, it has necessitate...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
Self-supervised learning technique is an under-explored topic for music audio due to the challenge o...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
In this paper, we work on a sound recognition system that continually incorporates new sound classes...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
The goal of universal audio representation learning is to obtain foundational models that can be use...
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a ...
Learning rich visual representations using contrastive self-supervised learning has been extremely s...
Self-supervised representation learning methods aim to provide powerful deep feature learning withou...
The success of supervised deep learning methods is largely due to their ability to learn relevant fe...