Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs of models designed for associated tasks. Frequently, however, not all contents are available for all samples of such a collection: For example, the original material may have been removed from the source platform at some point, and therefore, non-auditory features can no longer be acquired. We demonstrate that a multi-encoder framework can be employed to deal with this issue by applying this method to attention-based deep learning systems, which are currently part of the state of the art in the domain of sound recognition. More speci...
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multipl...
Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bea...
Multimodal deep learning aims at combining the complementary information of different modalities. Am...
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and ...
Whether crossing the road or enjoying a concert, sound carries important information about the world...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Environmental sound and acoustic scene classification are crucial tasks in audio signal processing a...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
This paper studies audio-visual suppression for egocentric videos -- where the speaker is not captur...
In this paper, we propose a multi-level attention model for the weakly labelled audio classification...
This electronic version was submitted by the student author. The certified thesis is available in th...
Humans are constantly exposed to a variety of acoustic stimuli ranging from music and speech to more...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
Automated analysis of complex scenes of everyday sounds might help us navigate within the enormous a...
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multipl...
Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bea...
Multimodal deep learning aims at combining the complementary information of different modalities. Am...
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and ...
Whether crossing the road or enjoying a concert, sound carries important information about the world...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Environmental sound and acoustic scene classification are crucial tasks in audio signal processing a...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
This paper studies audio-visual suppression for egocentric videos -- where the speaker is not captur...
In this paper, we propose a multi-level attention model for the weakly labelled audio classification...
This electronic version was submitted by the student author. The certified thesis is available in th...
Humans are constantly exposed to a variety of acoustic stimuli ranging from music and speech to more...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
Automated analysis of complex scenes of everyday sounds might help us navigate within the enormous a...
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multipl...
Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bea...
Multimodal deep learning aims at combining the complementary information of different modalities. Am...