As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information Retrieval. In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain. To this end, we propose MusCALL, a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences, producing multimodal embeddings that can be used for text-to-audio and audio-to-text retrieval out-of-the-box. Thanks to this property, MusCALL can be transferred to virtually any tas...
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approac...
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates giv...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
As one of the most intuitive interfaces known to humans, natural language has the potential to media...
Music tagging and content-based retrieval systems have traditionally been constructed using pre-defi...
Modeling various aspects that make a music piece unique is a challenging task, requiring the combina...
Modeling various aspects that make a music piece unique is a challenging task, requiring the combina...
Traditional music search engines rely on retrieval methods that match natural language queries with ...
This work addresses the problem of matching musical audio directly to sheet music, without any highe...
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multime...
We consider the task of retrieving audio using free-form natural language queries. To study this pro...
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projectin...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
Lyric interpretations can help people understand songs and their lyrics quickly, and can also make i...
The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains i...
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approac...
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates giv...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
As one of the most intuitive interfaces known to humans, natural language has the potential to media...
Music tagging and content-based retrieval systems have traditionally been constructed using pre-defi...
Modeling various aspects that make a music piece unique is a challenging task, requiring the combina...
Modeling various aspects that make a music piece unique is a challenging task, requiring the combina...
Traditional music search engines rely on retrieval methods that match natural language queries with ...
This work addresses the problem of matching musical audio directly to sheet music, without any highe...
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multime...
We consider the task of retrieving audio using free-form natural language queries. To study this pro...
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projectin...
Self-supervised audio representation learning offers an attractive alternative for obtaining generic...
Lyric interpretations can help people understand songs and their lyrics quickly, and can also make i...
The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains i...
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approac...
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates giv...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...