This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data
One of the most commonly used audiovisual fusion approaches is feature-level fusion where the audio ...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech...
Visual speech recognition is a challenging research problem with a particular practical application ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Extending automatic speech recognition (ASR) to the vi sual modality has been shown to greatly incre...
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing aco...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
HSC2001: IEEE International Workshop on Hands-Free Speech Communication, April 9-11, 2001, Kyoto, ...
A major goal of current speech recognition research is to improve the robustness of recognition syst...
We present recent work on improving the performance of automated speech recognizers by using additio...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-...
As evidence of a link between the various human communication production domains has become more pro...
Multimodal information processing has received considerable attention in recent years. The focus of ...
One of the most commonly used audiovisual fusion approaches is feature-level fusion where the audio ...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech...
Visual speech recognition is a challenging research problem with a particular practical application ...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Extending automatic speech recognition (ASR) to the vi sual modality has been shown to greatly incre...
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing aco...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
HSC2001: IEEE International Workshop on Hands-Free Speech Communication, April 9-11, 2001, Kyoto, ...
A major goal of current speech recognition research is to improve the robustness of recognition syst...
We present recent work on improving the performance of automated speech recognizers by using additio...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
In this paper, we address the problem of automatic discovery of speech patterns using audio-visual i...
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-...
As evidence of a link between the various human communication production domains has become more pro...
Multimodal information processing has received considerable attention in recent years. The focus of ...
One of the most commonly used audiovisual fusion approaches is feature-level fusion where the audio ...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech...
Visual speech recognition is a challenging research problem with a particular practical application ...