The goal of this paper is speaker diarisation of videos collected ‘in the wild’. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from ‘in the wild’ videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a ...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...
International audienceThis paper presents a semi-automatic approach to create a diachronic corpus of...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual s...
Most existing datasets for speaker identification contain samples obtained under quite constrained c...
Speaker diarization is originally defined as the task of de-termining “who spoke when ” given an aud...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods...
Abstract—Speaker diarization is the task of determining “who spoke when? ” in an audio or video reco...
This paper describes the BUCEA speaker diarization system for the 2022 VoxCeleb Speaker Recognition ...
The following article presents a novel audio-visual approach for unsupervised speaker localization i...
Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some aut...
With the rapid growth of the multimedia data, especially for videos, the ability to better and time-...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...
International audienceThis paper presents a semi-automatic approach to create a diachronic corpus of...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual s...
Most existing datasets for speaker identification contain samples obtained under quite constrained c...
Speaker diarization is originally defined as the task of de-termining “who spoke when ” given an aud...
Abstract. Our goal is to create speaker models in audio domain and face models in video domain from ...
The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods...
Abstract—Speaker diarization is the task of determining “who spoke when? ” in an audio or video reco...
This paper describes the BUCEA speaker diarization system for the 2022 VoxCeleb Speaker Recognition ...
The following article presents a novel audio-visual approach for unsupervised speaker localization i...
Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some aut...
With the rapid growth of the multimedia data, especially for videos, the ability to better and time-...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...
International audienceThis paper presents a semi-automatic approach to create a diachronic corpus of...
International audienceAny multi-party conversation system benefits from speaker diarization, that is...