Situational awareness is achieved naturally by the human senses of sight and hearing in combination. System-level automatic scene understanding aims at replicating this human ability using cooperative microphones and cameras. In this thesis, we integrate and fuse audio and video signals at different levels of abstractions to detect and track a speaker in a scenario where people are free to move indoors. Despite the low complexity of the system, which consists of just 4 microphones pairs and 1 camera, results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talking. The system evaluation is performed on both single modality and multimodality tracking. The pe...
Tracking speakers in multi-party conversations represents an important step towards automatic analys...
We propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sen...
We describe the design of a system consisting of several state-of-the-art real-time audio and video ...
AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in comb...
AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in comb...
It is often advantageous to track objects in a scene using multimodal information when such informat...
It is often advantageous to track objects in a scene using multimodal information when such informat...
PhD ThesisThis thesis concerns the problem of target localization and tracking in an indoor environm...
International audienceMultiple-speaker tracking is a crucial task for many applications. In real-wor...
International audienceMultiple-speaker tracking is a crucial task for many applications. In real-wor...
The objective of the MultiModal Meeting Manager (M4) project is to produce a system to enable struct...
We describe the design of a system consisting of several state-of-the-art real-time audio and video ...
An integrated system approach was developed to address the problem of distant speech acquisition in ...
Tracking speakers in multi-party conversations represents an important step towards automatic analys...
This paper addresses the coordinated use of video and audio cues to capture and index surveillance e...
Tracking speakers in multi-party conversations represents an important step towards automatic analys...
We propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sen...
We describe the design of a system consisting of several state-of-the-art real-time audio and video ...
AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in comb...
AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in comb...
It is often advantageous to track objects in a scene using multimodal information when such informat...
It is often advantageous to track objects in a scene using multimodal information when such informat...
PhD ThesisThis thesis concerns the problem of target localization and tracking in an indoor environm...
International audienceMultiple-speaker tracking is a crucial task for many applications. In real-wor...
International audienceMultiple-speaker tracking is a crucial task for many applications. In real-wor...
The objective of the MultiModal Meeting Manager (M4) project is to produce a system to enable struct...
We describe the design of a system consisting of several state-of-the-art real-time audio and video ...
An integrated system approach was developed to address the problem of distant speech acquisition in ...
Tracking speakers in multi-party conversations represents an important step towards automatic analys...
This paper addresses the coordinated use of video and audio cues to capture and index surveillance e...
Tracking speakers in multi-party conversations represents an important step towards automatic analys...
We propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sen...
We describe the design of a system consisting of several state-of-the-art real-time audio and video ...