Self-supervised learning technique is an under-explored topic for music audio due to the challenge of designing an appropriate training paradigm. We hence propose MAP-MERT, a large-scale music audio pre-trained model for general music understanding. We achieve performance that is comparable to the state-of-the-art pre-trained model Jukebox using less than 2% of parameters
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing wi...
Can we leverage the audiovisual information already present in video to improve self-supervised repr...
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a ...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Very few large-scale music research datasets are publicly available. There is an increasing need for...
In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representation...
Very few large-scale music research datasets are publicly available. There is an increasing need for...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
We propose a novel method to model hierarchical metrical structures for both symbolic music and audi...
Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, w...
While deep learning has enabled great advances in many areas of music, labeled music datasets remain...
We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn r...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing wi...
Can we leverage the audiovisual information already present in video to improve self-supervised repr...
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a ...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
In this work, we provide a broad comparative analysis of strategies for pre-training audio understan...
Very few large-scale music research datasets are publicly available. There is an increasing need for...
In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representation...
Very few large-scale music research datasets are publicly available. There is an increasing need for...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
We propose a novel method to model hierarchical metrical structures for both symbolic music and audi...
Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, w...
While deep learning has enabled great advances in many areas of music, labeled music datasets remain...
We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn r...
Pre-trained models are essential as feature extractors in modern machine learning systems in various...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing wi...
Can we leverage the audiovisual information already present in video to improve self-supervised repr...
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a ...