The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams. First, the Multi-Stream formalism is briefly recalled. Then, on top of the Multi-Stream motivations, experiments on the M2VTS multimodal database are presented and discussed. To our knowledge, these are the first experiments about multi-speaker continuous Audio-Visual Speech Recognition (AVSR). It is shown that the Multi-Stream approach can yield improved Audio-Visual spee...
Analysis of data on human auditory processing suggests machine recognition paradigm, in which parall...
Human speech processing is often a multimodal process combining audio and visual processing. Eyes an...
In this thesis, a number of important issues relating to the use of both audio and video information...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
The method which is called the “tandem approach ” in speech recog-nition has been shown to increase ...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
The increase in the number of multimedia applications that require robust speech recognition systems...
This paper proposes a real-time algorithmic framework for Automatic Speech Recognition (ASR) in pres...
Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Multi-stream and multi-band methods can improve the accuracy of speech recognition systems without o...
We report progress in the use of multi-stream spectro-temporal features for both small and large voc...
Analysis of data on human auditory processing suggests machine recognition paradigm, in which parall...
Human speech processing is often a multimodal process combining audio and visual processing. Eyes an...
In this thesis, a number of important issues relating to the use of both audio and video information...
This paper describes a complete system for audio-visual recognition of continuous speech including r...
We address the problem of robust lip tracking, visual speech feature extraction, and sensor integrat...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
The method which is called the “tandem approach ” in speech recog-nition has been shown to increase ...
With the increase in the computational complexity of recent computers, audio-visual speech recogniti...
The increase in the number of multimedia applications that require robust speech recognition systems...
This paper proposes a real-time algorithmic framework for Automatic Speech Recognition (ASR) in pres...
Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer...
Abstract—This paper presents the design and evaluation of a speaker-independent audio-visual speech ...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Multi-stream and multi-band methods can improve the accuracy of speech recognition systems without o...
We report progress in the use of multi-stream spectro-temporal features for both small and large voc...
Analysis of data on human auditory processing suggests machine recognition paradigm, in which parall...
Human speech processing is often a multimodal process combining audio and visual processing. Eyes an...
In this thesis, a number of important issues relating to the use of both audio and video information...