We present a novel speaker diarization method by using eye-gaze information in multi-party conversations. In real environ-ments, speaker diarization or speech activity detection of each participant of the conversation is challenging because of distant talking and ambient noise. In contrast, eye-gaze information is robust against acoustic degradation, and it is presumed that eye-gaze behavior plays an important role in turn-taking and thus in predicting utterances. The proposed method stochastically integrates eye-gaze information with acoustic information for speaker diarization. Specifically, three models are investigated for multi-modal integration in this paper. Experimental eval-uations in real poster sessions demonstrate that the propo...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
International audienceClassical visual attention models neither consider social cues, such as faces,...
Abstract — This paper extends the affective computing re-search field by introducing first-person vi...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual...
In multi-agent, multi-user environments, users as well as agents should have a means of establishing...
International audienceIn this paper, we describe two series of experiments that examine audiovisual ...
When humans converse with each other, they naturally amal-gamate information from multiple modalitie...
Recent studies in conversation analysis, psycholinguistics and interaction technology have pointed a...
International audienceSpeaker diarization consists of assigning speech signals to people engaged in ...
We studied whether the gaze direction of users indicates whom they are speaking or listening to in m...
Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that...
Face-to-face conversation implies that interaction should be characterized as an inherently multimod...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
We present here the analysis of multimodal data gathered during realistic face-to-face interaction o...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
International audienceClassical visual attention models neither consider social cues, such as faces,...
Abstract — This paper extends the affective computing re-search field by introducing first-person vi...
Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual...
In multi-agent, multi-user environments, users as well as agents should have a means of establishing...
International audienceIn this paper, we describe two series of experiments that examine audiovisual ...
When humans converse with each other, they naturally amal-gamate information from multiple modalitie...
Recent studies in conversation analysis, psycholinguistics and interaction technology have pointed a...
International audienceSpeaker diarization consists of assigning speech signals to people engaged in ...
We studied whether the gaze direction of users indicates whom they are speaking or listening to in m...
Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that...
Face-to-face conversation implies that interaction should be characterized as an inherently multimod...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
We present here the analysis of multimodal data gathered during realistic face-to-face interaction o...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
International audienceClassical visual attention models neither consider social cues, such as faces,...