International audienceWhile successful on broadcast news, meetings or telephone conversation, state-of-the-art speaker diarization techniques tend to perform poorly on TV series or movies. In this paper, we propose to rely on state-of-the-art face clustering techniques to guide acoustic speaker diarization. Two approaches are tested and evaluated on the rst season of Game Of Thrones TV series. The second (better) approach relies on a novel talking-face detection module based on bidirectional long short-term memory recurrent neural network. Both audio-visual approaches outperform the audio only baseline. A detailed study of the behavior of these approaches is also provided and paves the way to fu...
The paper concentrates on speaker diarization over meeting recordings. The task of speaker diarizati...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...
The goal in Speaker Diarization (SD) is to answer the question "Who spoke when?" for a given audio w...
Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative cluster...
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, autom...
Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording...
This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on ...
International audienceOur goal is to automatically identify faces in TV broadcast without a pre-defi...
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the...
International audienceWe investigate the problem of audiovisual (AV) person di-arization in broadcas...
Our goal is to automatically identify faces in TV broadcast without a pre-defined dictionary of iden...
Speaker diarization finds contiguous speaker segments in an audio recording and clusters them by spe...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
The paper concentrates on speaker diarization over meeting recordings. The task of speaker diarizati...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...
The goal in Speaker Diarization (SD) is to answer the question "Who spoke when?" for a given audio w...
Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative cluster...
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, autom...
Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording...
This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on ...
International audienceOur goal is to automatically identify faces in TV broadcast without a pre-defi...
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the...
International audienceWe investigate the problem of audiovisual (AV) person di-arization in broadcas...
Our goal is to automatically identify faces in TV broadcast without a pre-defined dictionary of iden...
Speaker diarization finds contiguous speaker segments in an audio recording and clusters them by spe...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
The paper concentrates on speaker diarization over meeting recordings. The task of speaker diarizati...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...