Most state-of-the-art approaches address speaker diarization as a hierarchical agglomerative clustering problem in the audio domain. In this paper, we propose to revisit one of them: speech turns clustering based on the Bayesian Information Criterion (a.k.a. BIC clustering). First, we show how to model it as an integer linear programming (ILP) problem. Its resolution leads to the same overall diarization error rate as standard BIC clustering but generates significantly purer speaker clusters. Then, we describe how this approach can easily be extended to the audiovisual domain and TV broadcast in particular. The straightforward integration of detected overlaid names (used to introduce guests or journalists, and obtained via video OCR) into a...
International audienceThis paper proposes a method for segmenting and clustering an audio flow on th...
International audienceWhile successful on broadcast news, meetings or telephone conversation, state-...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...
International audienceMost state-of-the-art approaches address speaker diariza- tion as a hierarchic...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised mann...
International audienceIn this paper, we propose a new clustering model for speaker diarization. A ma...
International audienceThis paper describes recent advances in speaker diarization by incorporating a...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
First we propose a reformulation of the Integer Linear Pro-gramming (ILP) clustering method we intro...
International audienceWe propose to study speaker diarization from a collection of audio documents. ...
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the...
In this paper we present a new diarization system based on the combination of LIUM and IRIT systems....
International audienceThis paper investigates single and cross-show diarization based on an unsuperv...
We present a novel probabilistic framework that fuses information coming from the audio and video mo...
International audienceThis paper proposes a method for segmenting and clustering an audio flow on th...
International audienceWhile successful on broadcast news, meetings or telephone conversation, state-...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...
International audienceMost state-of-the-art approaches address speaker diariza- tion as a hierarchic...
In this paper we present our system for speaker diarization of broad-cast news based on recent advan...
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised mann...
International audienceIn this paper, we propose a new clustering model for speaker diarization. A ma...
International audienceThis paper describes recent advances in speaker diarization by incorporating a...
International audienceAbstract:This paper describes recent advances in speaker diarization with a mu...
First we propose a reformulation of the Integer Linear Pro-gramming (ILP) clustering method we intro...
International audienceWe propose to study speaker diarization from a collection of audio documents. ...
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the...
In this paper we present a new diarization system based on the combination of LIUM and IRIT systems....
International audienceThis paper investigates single and cross-show diarization based on an unsuperv...
We present a novel probabilistic framework that fuses information coming from the audio and video mo...
International audienceThis paper proposes a method for segmenting and clustering an audio flow on th...
International audienceWhile successful on broadcast news, meetings or telephone conversation, state-...
This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize...