In this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to super-vise the clustering process. This is made possible by mapping both modalities into the same metric space. To this end, we fully ex-ploit the geometric and physical properties of an audio-visual sen-sor based on binocular vision and binaural hearing. We propose an EM algorithm that is theoretically well justified, intuitive, and extremely efficient from a computational point of view. This ef-ficiency makes the method impleme...
In this thesis, the modelling of audio-visual perception with a head-like device is considered. The ...
International audienceData clustering has received a lot of attention and numerous methods, algorith...
Current computer vision techniques can effectively monitor gross activities in sparse environments. ...
In this paper we address the problem of detecting and localizing objects that can be both seen and h...
In this paper we address the problem of detecting and localizing objects that can be both seen and h...
International audienceThis paper addresses the issues of detecting and localizing objects in a scene...
International audienceWe address the issue of identifying and localizing individuals in a scene that...
International audienceIn this paper we address the problem of detecting and locating speakers using ...
Natural human-robot interaction in complex and unpredictable environments is one of the main researc...
International audienceThe problem of multimodal clustering arises whenever the data are gathered wit...
Abstract This chapter presents novel computationally efficient algorithms to extract semantically me...
International audienceIn this paper we address the problem of audio-visual speaker detection. We int...
Abstract—In this paper we address the problem of audio-visual speaker detection. We introduce an onl...
One of the main issues within the field of social robotics is to endow robots with the ability to di...
Acoustic events produced in meeting environments may contain useful information for perceptually awa...
In this thesis, the modelling of audio-visual perception with a head-like device is considered. The ...
International audienceData clustering has received a lot of attention and numerous methods, algorith...
Current computer vision techniques can effectively monitor gross activities in sparse environments. ...
In this paper we address the problem of detecting and localizing objects that can be both seen and h...
In this paper we address the problem of detecting and localizing objects that can be both seen and h...
International audienceThis paper addresses the issues of detecting and localizing objects in a scene...
International audienceWe address the issue of identifying and localizing individuals in a scene that...
International audienceIn this paper we address the problem of detecting and locating speakers using ...
Natural human-robot interaction in complex and unpredictable environments is one of the main researc...
International audienceThe problem of multimodal clustering arises whenever the data are gathered wit...
Abstract This chapter presents novel computationally efficient algorithms to extract semantically me...
International audienceIn this paper we address the problem of audio-visual speaker detection. We int...
Abstract—In this paper we address the problem of audio-visual speaker detection. We introduce an onl...
One of the main issues within the field of social robotics is to endow robots with the ability to di...
Acoustic events produced in meeting environments may contain useful information for perceptually awa...
In this thesis, the modelling of audio-visual perception with a head-like device is considered. The ...
International audienceData clustering has received a lot of attention and numerous methods, algorith...
Current computer vision techniques can effectively monitor gross activities in sparse environments. ...