The aim of this work is to examine the correlation between audio and visual speech features. The motivation is to find visual features that can provide clean audio feature estimates which can be used for speech enhancement when the original audio signal is corrupted by noise. Two audio features (MFCCs and formants) and three visual features (active appearance model, 2-D DCT and cross-DCT) are considered with correlation measured using multiple linear regression. The correlation is then exploited through the development of a maximum a posteriori (MAP) prediction of audio features solely from the visual features. Experiments reveal that features representing broad spectral information have higher correlation to visual features than those repr...
This work develops a statistical framework to predict acoustic features (fundamental frequency, form...
<p>Correlations between colour patch parameters and audio features in 27 film music stimuli.</p
ABSTRACT The addition of visual information derived from the speaker's lip movements to a speec...
The aim of this work is to examine the correlation between audio and visual speech features. The mot...
The aim of this work is to investigate a selection of audio and visual speech features with the aim ...
As evidence of a link between the various human communication production domains has become more pro...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
This paper investigates the statistical relationship between acoustic and visual speech features for...
This paper examines the degree of correlation between lip and jaw configuration and speech acoustics...
The 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Irel...
The aim of this paper is to use visual speech information to create Wiener filters for audio speech ...
This paper examines the degree of correlation between lip and jaw conguration and speech acoustics. ...
This work begins by examining the correlation between audio and visual speech features and reveals h...
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and...
This work is concerned with generating intelligible audio speech from a video of a person talking. R...
This work develops a statistical framework to predict acoustic features (fundamental frequency, form...
<p>Correlations between colour patch parameters and audio features in 27 film music stimuli.</p
ABSTRACT The addition of visual information derived from the speaker's lip movements to a speec...
The aim of this work is to examine the correlation between audio and visual speech features. The mot...
The aim of this work is to investigate a selection of audio and visual speech features with the aim ...
As evidence of a link between the various human communication production domains has become more pro...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
This paper investigates the statistical relationship between acoustic and visual speech features for...
This paper examines the degree of correlation between lip and jaw configuration and speech acoustics...
The 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Irel...
The aim of this paper is to use visual speech information to create Wiener filters for audio speech ...
This paper examines the degree of correlation between lip and jaw conguration and speech acoustics. ...
This work begins by examining the correlation between audio and visual speech features and reveals h...
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and...
This work is concerned with generating intelligible audio speech from a video of a person talking. R...
This work develops a statistical framework to predict acoustic features (fundamental frequency, form...
<p>Correlations between colour patch parameters and audio features in 27 film music stimuli.</p
ABSTRACT The addition of visual information derived from the speaker's lip movements to a speec...