This paper describes a complete system for audio-visual recognition of continuous speech including robust lip tracking, visual feature extraction, noise-robust acoustic feature extraction, and sensor integration. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal modeling of the acoustic and visual speech signal by applying multi-stream hidden Markov models. This approach allows the definition of different temporal topologies and levels of stream integration and hence enables to model temporal dependencies more accurately than traditional approaches. We pr...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
The increase in the number of multimedia applications that require robust speech recognition systems...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
The Multi-Stream automatic speech recognition approach was investigated in this work as a framework ...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
Speech recognition can be improved by using visual information in the form of lip movements of the s...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
This work consists on designing a continuous speech recognition system using pattern recognition tec...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
The increase in the number of multimedia applications that require robust speech recognition systems...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
The Multi-Stream automatic speech recognition approach was investigated in this work as a framework ...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
Speech recognition can be improved by using visual information in the form of lip movements of the s...
In this paper, we present a new approach to visual speech recognition which improves contextual mode...
This work consists on designing a continuous speech recognition system using pattern recognition tec...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
This paper gives an overview of the principles of a system for phoneme based, large vocabulary, cont...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...