Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker's face, which is not always the case in realistic human-computer interaction (HCI) scenarios. One such case of interest is HCI inside smart rooms, equipped with pan-tilt-zoom (PTZ) cameras that closely track the subject's head. Since however these cameras are fixed in space, they cannot necessarily obtain frontal views of the speaker. Clearly, AVASR from non-frontal views is required, as well as fusion of multiple camera views, if available. In this paper, we report our very preliminary work on this subject. I...
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-i...
Biometrics has been a topic of great interest since the advent of the information age and will soon ...
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recogn...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual information from a speaker's mouth region is\ud known to improve automatic speech recognition...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual speech cues are known to improve the performance of automatic speech recognition (ASR). Howev...
The vast majority of studies in the field of audio-visual automatic\ud speech recognition (AVASR) as...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
Despite significant advances in the area of Automatic Speech Recognition, (ASR) systems still resul...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
This paper describes the audio-visual database collected at AT&T Labs--Research for the study of...
International audienceAudiovisual automatic speech recognition (AV-ASR) is an extension of ASR that ...
AbstractThis paper presents an Active Appearance Model (AAM) based multiple camera visual speech rec...
We present a prototype for the automatic recognition of audiovisual speech, developed to augment the...
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-i...
Biometrics has been a topic of great interest since the advent of the information age and will soon ...
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recogn...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual information from a speaker's mouth region is\ud known to improve automatic speech recognition...
Visual information from a speaker's mouth region is known to improve automatic speech recognition ro...
Visual speech cues are known to improve the performance of automatic speech recognition (ASR). Howev...
The vast majority of studies in the field of audio-visual automatic\ud speech recognition (AVASR) as...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
Despite significant advances in the area of Automatic Speech Recognition, (ASR) systems still resul...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
This paper describes the audio-visual database collected at AT&T Labs--Research for the study of...
International audienceAudiovisual automatic speech recognition (AV-ASR) is an extension of ASR that ...
AbstractThis paper presents an Active Appearance Model (AAM) based multiple camera visual speech rec...
We present a prototype for the automatic recognition of audiovisual speech, developed to augment the...
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-i...
Biometrics has been a topic of great interest since the advent of the information age and will soon ...
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recogn...