We address the problem of both estimating the dominant person in a meeting from a single audio source and identifying them visually in a multi-camera setting. We use a speaker diarization algorithm to perform speaker segmentation and clustering, representing when they spoke. Using a greedy ordered audio-visual association algorithm, we investigate using the speaker clusters to find the corresponding person in one of the video channels. The difficulty of the problem is that firstly the speaker diarization output is noisy (e.g. for participants who speak little) and often produces an unequal number of clusters to true participants. Secondly, personal visual activity from natural upper torso motion, which can include highly deformable pose cha...
Speaker Diarization aims at inferring who spoke when in an audio stream and involves two simultaneou...
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, autom...
In this paper we address the problem of estimating who is speak-ing from automatically extracted low...
Dominance - a behavioral expression of power - is a fundamental mechanism of social interaction, exp...
With the increase in cheap commercially available sensors, recording meetings is becoming an increas...
With the increase in cheap commercially available sensors, recording meetings is becoming an increas...
This paper addresses the multimodal nature of social dominance and presents multimodal fusion techni...
In this paper, we apply speaker diarization strategies from a single source to the task of estimatin...
Dominance is referred to the level of influence that a person has in a conversation. Dominance is an...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual...
foya.aran, gaticag @ idiap.ch This paper addresses the multimodal nature of so-cial dominance and pr...
This paper addresses the problem of automatically predict-ing the dominant clique (i.e., the set of ...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
International audienceSpeaker diarization consists of assigning speech signals to people engaged in ...
Speaker Diarization aims at inferring who spoke when in an audio stream and involves two simultaneou...
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, autom...
In this paper we address the problem of estimating who is speak-ing from automatically extracted low...
Dominance - a behavioral expression of power - is a fundamental mechanism of social interaction, exp...
With the increase in cheap commercially available sensors, recording meetings is becoming an increas...
With the increase in cheap commercially available sensors, recording meetings is becoming an increas...
This paper addresses the multimodal nature of social dominance and presents multimodal fusion techni...
In this paper, we apply speaker diarization strategies from a single source to the task of estimatin...
Dominance is referred to the level of influence that a person has in a conversation. Dominance is an...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual...
foya.aran, gaticag @ idiap.ch This paper addresses the multimodal nature of so-cial dominance and pr...
This paper addresses the problem of automatically predict-ing the dominant clique (i.e., the set of ...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
International audienceSpeaker diarization consists of assigning speech signals to people engaged in ...
Speaker Diarization aims at inferring who spoke when in an audio stream and involves two simultaneou...
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, autom...
In this paper we address the problem of estimating who is speak-ing from automatically extracted low...