We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which of two face images is the speaker. In this paper we study this, and a number of related cross-modal tasks, aimed at answering the question: how much can we infer from the voice about the face and vice versa? We study this task “in the wild”, employing the datasets that are now publicly available for face recognition from static images (VGGFace) and speaker identification from audio (VoxCeleb). These provide training and testing scenarios for both static and dynamic testing of cross-modal matching. We make the following contributions: (i) we introduce CNN architectures for both binary and multi-way cross-modal face and audio matching; (ii) w...
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal f...
Face and voice are two preeminent physical cues describing a person. In unimodal face studies, faces...
Many face studies have shown that in memory tasks, distinctive faces are more easily recognized than...
We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalit...
From border controls to personal devices, from online exam proctoring to human-robot interaction, bi...
Previous research has suggested that people are unable to correctly choose which unfamiliar voice an...
Investigating face recognition with voices and face morphs Humans can easily identify faces at the i...
Abstract-The research presented in this paper describes audio-visual speaker identification experime...
AbstractSpeech perception provides compelling examples of a strong link between auditory and visual ...
Research suggests that both static and dynamic faces share identity information with voices. However...
Biometrics identification using multiple modalities has attracted the attention of many researchers ...
Recent years have seen a surge in finding association between faces and voices within a cross-modal ...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
Speaker recognition achieved great progress recently, however, it is not easy or efficient to furthe...
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal f...
Face and voice are two preeminent physical cues describing a person. In unimodal face studies, faces...
Many face studies have shown that in memory tasks, distinctive faces are more easily recognized than...
We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalit...
From border controls to personal devices, from online exam proctoring to human-robot interaction, bi...
Previous research has suggested that people are unable to correctly choose which unfamiliar voice an...
Investigating face recognition with voices and face morphs Humans can easily identify faces at the i...
Abstract-The research presented in this paper describes audio-visual speaker identification experime...
AbstractSpeech perception provides compelling examples of a strong link between auditory and visual ...
Research suggests that both static and dynamic faces share identity information with voices. However...
Biometrics identification using multiple modalities has attracted the attention of many researchers ...
Recent years have seen a surge in finding association between faces and voices within a cross-modal ...
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make ...
Speaker recognition achieved great progress recently, however, it is not easy or efficient to furthe...
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal f...
Face and voice are two preeminent physical cues describing a person. In unimodal face studies, faces...
Many face studies have shown that in memory tasks, distinctive faces are more easily recognized than...