Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identi- fication dataset collected ‘in the wild’. We make two contributions. First, we propose a fully automated pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verifi- cation using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker using CNN based facial recognition. We use this pipeline to curate VoxCeleb which contains hund...
Speaker identification with deep learning commonly use time-frequency representation of the voice si...
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognitio...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...
Most existing datasets for speaker identification contain samples obtained under quite constrained c...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally differe...
This Master Thesis (MT) describes different techniques for speaker identification. Our goal is two-f...
Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substan...
The goal of this paper is speaker diarisation of videos collected ‘in the wild’. We make three key ...
Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and ...
Our aim is to recognise the words being spoken by a talking face, given only the video but not the a...
Speaker identification techniques are one of those most advanced modern technologies and there are m...
The performance of speaker recognition systems has considerably improved in the last decade. This is...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Speaker identification with deep learning commonly use time-frequency representation of the voice si...
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognitio...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...
Most existing datasets for speaker identification contain samples obtained under quite constrained c...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally differe...
This Master Thesis (MT) describes different techniques for speaker identification. Our goal is two-f...
Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substan...
The goal of this paper is speaker diarisation of videos collected ‘in the wild’. We make three key ...
Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and ...
Our aim is to recognise the words being spoken by a talking face, given only the video but not the a...
Speaker identification techniques are one of those most advanced modern technologies and there are m...
The performance of speaker recognition systems has considerably improved in the last decade. This is...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Speaker identification with deep learning commonly use time-frequency representation of the voice si...
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognitio...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...