The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-cha...
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for per...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
The use of visual features in the form of lip movements to improve the performance of acoustic speec...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recogn...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
Speech is the most important tool of interaction among human beings. This has inspired researchers t...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
163 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.Computer technologies have im...
© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-...
This paper describes audio-visual speech recognition system for Polish language and a set of perform...
This paper examines the utility of audio-visual speech for the two related tasks of speech and speak...
There has been growing interest in introducing speech as a new modality into the human-computer inte...
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for per...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
The use of visual features in the form of lip movements to improve the performance of acoustic speec...
Automatic speech recognition (ASR) holds the promise of providing a natural, efficient, and safer me...
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recogn...
Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increa...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
Speech is the most important tool of interaction among human beings. This has inspired researchers t...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
163 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.Computer technologies have im...
© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-...
This paper describes audio-visual speech recognition system for Polish language and a set of perform...
This paper examines the utility of audio-visual speech for the two related tasks of speech and speak...
There has been growing interest in introducing speech as a new modality into the human-computer inte...
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for per...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...
Humans are often able to compensate for noise degradation and uncertainty in speech information by a...