We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speechclass conditional observation probabilities of appropriate audioor visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, twostream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonst...
In pattern recognition one usually relies on measuring a set of informative features to perform task...
In this paper we present the application of hidden conditional random fields (HCRFs) to modeling spe...
Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer...
We present a method for dynamically integrating audio-visual information for speech recognition, bas...
We present a method for multimodal fusion based on the estimated reliability of each individual moda...
We present a method for dynamically integrating audio-visual information for speech recognition, bas...
Merging decisions from different modalities is a crucial prob-lem in Audio-Visual Speech Recognition...
A prototype multi-stream system with a performance monitor for stream selection is proposed to recog...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
A prototype multi-stream system with a performance monitor for stream selection is proposed to recog...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to levera...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
In pattern recognition one usually relies on measuring a set of informative features to perform task...
In this paper we present the application of hidden conditional random fields (HCRFs) to modeling spe...
Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer...
We present a method for dynamically integrating audio-visual information for speech recognition, bas...
We present a method for multimodal fusion based on the estimated reliability of each individual moda...
We present a method for dynamically integrating audio-visual information for speech recognition, bas...
Merging decisions from different modalities is a crucial prob-lem in Audio-Visual Speech Recognition...
A prototype multi-stream system with a performance monitor for stream selection is proposed to recog...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
A prototype multi-stream system with a performance monitor for stream selection is proposed to recog...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to levera...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
In pattern recognition one usually relies on measuring a set of informative features to perform task...
In this paper we present the application of hidden conditional random fields (HCRFs) to modeling spe...
Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer...