Visual speech cues are known to improve the performance of automatic speech recognition (ASR). However, many researchers have used speaker's frontal pose mainly. We therefore introduce a new database for large vocabulary audio visual automatic speech recognition (AV-ASR), which contains not only frontal face images but also face images taken from different angles (multi-view face images). Another contribution of this paper is to present a new algorithm which can model audio and visual characteristics between phones. Finally we conducted large vocabulary continuous speech recognition experiments on the new database using the new algorithm. Experimental results show that the proposed AV-ASR system achieved high accuracy even if there are...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Audio-visual recognition system is becoming popular because it overcomes certain problems of traditi...
The increase in the number of multimedia applications that require robust speech recognition systems...
The vast majority of studies in the field of audio-visual automatic\ud speech recognition (AVASR) as...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual information from a speaker's mouth region is\ud known to improve automatic speech recognition...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
As one of the techniques for robust speech recognition under noisy environments, audio-visual speech...
Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely re...
AbstractThis paper presents an Active Appearance Model (AAM) based multiple camera visual speech rec...
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-i...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
In audio-visual automatic speech recognition (AVASR), no research to date has been conducted into th...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
International audienceAudiovisual automatic speech recognition (AV-ASR) is an extension of ASR that ...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Audio-visual recognition system is becoming popular because it overcomes certain problems of traditi...
The increase in the number of multimedia applications that require robust speech recognition systems...
The vast majority of studies in the field of audio-visual automatic\ud speech recognition (AVASR) as...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual information from a speaker's mouth region is\ud known to improve automatic speech recognition...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
As one of the techniques for robust speech recognition under noisy environments, audio-visual speech...
Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely re...
AbstractThis paper presents an Active Appearance Model (AAM) based multiple camera visual speech rec...
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-i...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
In audio-visual automatic speech recognition (AVASR), no research to date has been conducted into th...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
International audienceAudiovisual automatic speech recognition (AV-ASR) is an extension of ASR that ...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Audio-visual recognition system is becoming popular because it overcomes certain problems of traditi...
The increase in the number of multimedia applications that require robust speech recognition systems...