In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, sho...
Speech recognition, by both humans and machines, benefits from visual observation of the face, espec...
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speec...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech...
The phenomenon of anticipatory coarticulation provides a ba-sis for the observed asynchrony between ...
Abstract—In audio-visual automatic speech recognition (AVASR) both acoustic and visual modalities of...
We describe a dynamic Bayesian network for articulatory feature recognition. The model is intended t...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pair...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
This paper builds on previous work where dynamic Bayesian networks (DBN) were proposed as a model f...
Speech recognition, by both humans and machines, benefits from visual observation of the face, espec...
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speec...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech...
The phenomenon of anticipatory coarticulation provides a ba-sis for the observed asynchrony between ...
Abstract—In audio-visual automatic speech recognition (AVASR) both acoustic and visual modalities of...
We describe a dynamic Bayesian network for articulatory feature recognition. The model is intended t...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pair...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
This paper builds on previous work where dynamic Bayesian networks (DBN) were proposed as a model f...
Speech recognition, by both humans and machines, benefits from visual observation of the face, espec...
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...