This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate exist- ing cross-correlations between neurons from the audio and video CNN which activate the same instrument category. We analyse two training schemes for multimodal applications and perform a comparative analysis and visualisation of model predictions. This work is supported by the Spanish Ministry of...
To detect audio manipulation in a pre recorded evidence videos by developing a synchronization verif...
In this work, we employ deep learning methods for visual onset detection. We focus on live music per...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISM...
This paper presents a method for recognising musical instruments in user-generated videos. Musical i...
Comunicació presentada a la International Conference on Multimedia Retrieval celebrada del 6 al 9 de...
Comunicació presentada a la International Conference on Multimedia Retrieval celebrada del 6 al 9 de...
Identifying musical instruments in a polyphonic music recording is a difficult yet crucial problem i...
In music perception, the information we receive from a visual system and audio system is often compl...
In music perception, the information we receive from a visual system and audio system is often compl...
Although instrument recognition has been thoroughly research, recognition in polyphonic music still ...
This paper proposes a method to facilitate labelling of music performance videos with automatic meth...
Predominant instrument recognition in polyphonic music is addressed using the score-level fusion of ...
In recent years, there have been numerous developments toward solving multimodal tasks, aiming to le...
To detect audio manipulation in a pre recorded evidence videos by developing a synchronization verif...
In this work, we employ deep learning methods for visual onset detection. We focus on live music per...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISM...
This paper presents a method for recognising musical instruments in user-generated videos. Musical i...
Comunicació presentada a la International Conference on Multimedia Retrieval celebrada del 6 al 9 de...
Comunicació presentada a la International Conference on Multimedia Retrieval celebrada del 6 al 9 de...
Identifying musical instruments in a polyphonic music recording is a difficult yet crucial problem i...
In music perception, the information we receive from a visual system and audio system is often compl...
In music perception, the information we receive from a visual system and audio system is often compl...
Although instrument recognition has been thoroughly research, recognition in polyphonic music still ...
This paper proposes a method to facilitate labelling of music performance videos with automatic meth...
Predominant instrument recognition in polyphonic music is addressed using the score-level fusion of ...
In recent years, there have been numerous developments toward solving multimodal tasks, aiming to le...
To detect audio manipulation in a pre recorded evidence videos by developing a synchronization verif...
In this work, we employ deep learning methods for visual onset detection. We focus on live music per...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...