We describe a novel approach for determining the audio-visual synchrony of a monologue video sequence utilizing vocal pitch and facial landmark trajectories as descriptors of the audio and visual modalities, respectively. The visual component is represented by the horizontal and vertical displacement of corresponding facial landmarks between subsequent frames. These facial landmarks are acquired using the statistical modeling technique, known as the Active Shape Model (ASM). The audio component is represented by the fundamental frequency, or pitch, obtained using the subharmonic-to-harmonic ratio (SHR). The synchrony between the audio and visual feature vectors is computed using Gaussian mutual information. The raw synchrony estimates obtai...
The phenomenon of anticipatory coarticulation provides a ba-sis for the observed asynchrony between ...
In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of ...
In this paper, we address the problem of lip-voice synchronisation in videos containing human face a...
Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark traje...
Previous research suggests that people are rather poor at perceiving auditory-visual (AV) speech asy...
This thesis presents a computational framework to jointly analyze auditory and visual information. T...
The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty ...
Abstract Audiovisual speech synchrony detection is an important part of talking-face verification s...
This paper proposes a method recovering audio-visual syn-chronization of multimedia content. It expl...
This paper presents a novel method to correlate audio and visual data generated by the same physical...
Human can extract speech signals that they need to understand from a mixture of background noise, in...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
The fine temporal structure of relations of acoustic and visual features has been investigated to im...
Psychophysical and physiological evidence shows that sound local-ization of acoustic signals is stro...
In our approach, we aim at an objective measurement of synchrony in multimodal behavior. The use of ...
The phenomenon of anticipatory coarticulation provides a ba-sis for the observed asynchrony between ...
In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of ...
In this paper, we address the problem of lip-voice synchronisation in videos containing human face a...
Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark traje...
Previous research suggests that people are rather poor at perceiving auditory-visual (AV) speech asy...
This thesis presents a computational framework to jointly analyze auditory and visual information. T...
The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty ...
Abstract Audiovisual speech synchrony detection is an important part of talking-face verification s...
This paper proposes a method recovering audio-visual syn-chronization of multimedia content. It expl...
This paper presents a novel method to correlate audio and visual data generated by the same physical...
Human can extract speech signals that they need to understand from a mixture of background noise, in...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
The fine temporal structure of relations of acoustic and visual features has been investigated to im...
Psychophysical and physiological evidence shows that sound local-ization of acoustic signals is stro...
In our approach, we aim at an objective measurement of synchrony in multimodal behavior. The use of ...
The phenomenon of anticipatory coarticulation provides a ba-sis for the observed asynchrony between ...
In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of ...
In this paper, we address the problem of lip-voice synchronisation in videos containing human face a...