This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize that the intra-speaker variation in visual information might be less than that in the corresponding acoustic information and therefore might be better suited to the task of speaker model initialisa-tion. This is an acknowledged weakness of the computation-ally efficient top-down approach to speaker diarization that is used here. Experimental results show that a recently pro-posed approach to purification and the new multimodal ap-proach to initialisation together deliver 22 % and 17 % relative improvements in diarization performance over the baseline system on independent development and evaluation datasets respectively. 1
This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on ...
Speaker Diarization is the process of partitioning an audio input into homogeneous segments accordin...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
Most speaker diarization systems fit into one of two cat-egories: bottom-up or top-down. Bottom-up s...
Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative cluster...
International audienceThis paper investigates single and cross-show diarization based on an unsuperv...
International audienceSpeaker diarization may be difficult to achieve when applied to narrative film...
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised mann...
International audienceWhile successful on broadcast news, meetings or telephone conversation, state-...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
The paper describes a novel method that improvises the procedure for supervised speaker diarization....
International audienceAudio-Visual People Diarization (AVPD) is an original framework that simultane...
We present a novel probabilistic framework that fuses information coming from the audio and video mo...
In this paper we present a novel scheme for improving speaker diarization by making use of repeating...
This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on ...
Speaker Diarization is the process of partitioning an audio input into homogeneous segments accordin...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
Most speaker diarization systems fit into one of two cat-egories: bottom-up or top-down. Bottom-up s...
Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative cluster...
International audienceThis paper investigates single and cross-show diarization based on an unsuperv...
International audienceSpeaker diarization may be difficult to achieve when applied to narrative film...
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised mann...
International audienceWhile successful on broadcast news, meetings or telephone conversation, state-...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
The paper describes a novel method that improvises the procedure for supervised speaker diarization....
International audienceAudio-Visual People Diarization (AVPD) is an original framework that simultane...
We present a novel probabilistic framework that fuses information coming from the audio and video mo...
In this paper we present a novel scheme for improving speaker diarization by making use of repeating...
This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on ...
Speaker Diarization is the process of partitioning an audio input into homogeneous segments accordin...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...