International audienceHumans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues. To achieve such cross-modal perception on machines, existing methods only use the maps generated by interpolation operations to localize the sound source. As semantic object-level localization is more attractive for potential practical applications, we argue that these existing map-based approaches only provide a coarse-grained and indirect description of the sound source. In this paper, we advocate a novel proposal-based paradigm that can directly perform semantic object-level localization, without any manual annotations. We incorporate the global response map as an unsupervised spatial const...
To identify the localization of indoor sound source, especially when attempted using only a single m...
International audienceSound sources localization (SSL) is a subject of active research in the field ...
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While ma...
International audienceHumans can easily recognize where and how the sound is produced via watching a...
Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn ...
Audio-visual source localization is a challenging task that aims to predict the location of visual s...
Learning to localize the sound source in videos without explicit annotations is a novel area of audi...
We present a method for simultaneously localizing multiple sound sources within a visual scene. This...
Sound source localization is an important topic in humanmachine interacting, teleconferencing, secur...
International audienceThis article is a survey of deep learning methods for single and multiple soun...
Data-based and learning-based sound source localization (SSL) has shown promising results in challen...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
In this paper our objectives are, first, networks that can embed audio and visual inputs into a comm...
Early computational approaches for sound source localization, originating in robotics, were modeled ...
Acoustic images constitute an emergent data modality for multimodal scene understanding. Such images...
To identify the localization of indoor sound source, especially when attempted using only a single m...
International audienceSound sources localization (SSL) is a subject of active research in the field ...
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While ma...
International audienceHumans can easily recognize where and how the sound is produced via watching a...
Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn ...
Audio-visual source localization is a challenging task that aims to predict the location of visual s...
Learning to localize the sound source in videos without explicit annotations is a novel area of audi...
We present a method for simultaneously localizing multiple sound sources within a visual scene. This...
Sound source localization is an important topic in humanmachine interacting, teleconferencing, secur...
International audienceThis article is a survey of deep learning methods for single and multiple soun...
Data-based and learning-based sound source localization (SSL) has shown promising results in challen...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
In this paper our objectives are, first, networks that can embed audio and visual inputs into a comm...
Early computational approaches for sound source localization, originating in robotics, were modeled ...
Acoustic images constitute an emergent data modality for multimodal scene understanding. Such images...
To identify the localization of indoor sound source, especially when attempted using only a single m...
International audienceSound sources localization (SSL) is a subject of active research in the field ...
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While ma...