Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability into the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design, based on a cascade of linear image transforms of an appropriate video region-of-interest, and subsequently, audio-visual speech integration. On the later topic, we discuss new work on feature and decision fusion combination, the modeling of audio-visual speech asynchrony, and incorporating modality reliability estimates to the bimodal re...
There has been growing interest in introducing speech as a new modality into the human-computer inte...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Extending automatic speech recognition (ASR) to the vi sual modality has been shown to greatly incre...
In this thesis, a number of important issues relating to the use of both audio and video information...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
Abstract. In this paper an evaluation of visual speech features is performed specifically for the ta...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
We compare automatic recognition with human perception of audio-visual speech, in the large-vocabula...
The increase in the number of multimedia applications that require robust speech recognition systems...
Bimodal automatic speech segmentation using visual information together with audio data is introduce...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
There has been growing interest in introducing speech as a new modality into the human-computer inte...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Extending automatic speech recognition (ASR) to the vi sual modality has been shown to greatly incre...
In this thesis, a number of important issues relating to the use of both audio and video information...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
Abstract. In this paper an evaluation of visual speech features is performed specifically for the ta...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
We compare automatic recognition with human perception of audio-visual speech, in the large-vocabula...
The increase in the number of multimedia applications that require robust speech recognition systems...
Bimodal automatic speech segmentation using visual information together with audio data is introduce...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
There has been growing interest in introducing speech as a new modality into the human-computer inte...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
The objective of this work is visual recognition of speech and gestures. Solving this problem opens ...