Wachsmuth S, Brandt-Pook H, Socher G, Kummert F, Sagerer G. Multilevel Integration of Vision and Speech Understanding Using Bayseian Networks. In: Christensen HI, ed. Computer Vision Systems. Lecture Notes in Computer Science. Vol 1542. Berlin, Heidelberg: Springer; 1999: 231-254.The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integrates various levels of processing by using multiple representations of the visually observed scene. They are vertically connected by Bayesian networks in order to find the ...
International audienceThe language acquisition literature shows that children do not build their lex...
Wachsmuth S, Swadzba A. Probabilistic Scene Modeling for Situated Computer Vision. In: Cohn AG, Hogg...
Current cognitive models of spoken word recognition and comprehension are underspecified with respec...
Wachsmuth S, Socher G, Brandt-Pook H, Kummert F, Sagerer G. Integration of Vision and Speech Underst...
Wachsmuth S, Sagerer G. Bayesian Networks for Speech and Image Integration. In: Proc. of 18th Natio...
Wachsmuth S. Multi-modal scene understanding using probabilistic models. Bielefeld (Germany): Bielef...
Image understanding denotes not only the ability to extract specific, non-numerical information from...
Moratz R, Eikmeyer H-J, Hildebrandt B, Kummert F, Rickheit G, Sagerer G. Integrating Speech and Sele...
In this paper, we explore neural network models that learn to associate segments of spoken audio cap...
Wachsmuth S, Fink GA, Kummert F, Sagerer G. Using Speech in Visual Object Recognition. Informatik Ak...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
A stochastic approach based on Dynamic Bayesian Networks (DBNs) is introduced for spoken language un...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speec...
International audienceThanks to their different senses, human observers acquire multiple information...
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framewor...
International audienceThe language acquisition literature shows that children do not build their lex...
Wachsmuth S, Swadzba A. Probabilistic Scene Modeling for Situated Computer Vision. In: Cohn AG, Hogg...
Current cognitive models of spoken word recognition and comprehension are underspecified with respec...
Wachsmuth S, Socher G, Brandt-Pook H, Kummert F, Sagerer G. Integration of Vision and Speech Underst...
Wachsmuth S, Sagerer G. Bayesian Networks for Speech and Image Integration. In: Proc. of 18th Natio...
Wachsmuth S. Multi-modal scene understanding using probabilistic models. Bielefeld (Germany): Bielef...
Image understanding denotes not only the ability to extract specific, non-numerical information from...
Moratz R, Eikmeyer H-J, Hildebrandt B, Kummert F, Rickheit G, Sagerer G. Integrating Speech and Sele...
In this paper, we explore neural network models that learn to associate segments of spoken audio cap...
Wachsmuth S, Fink GA, Kummert F, Sagerer G. Using Speech in Visual Object Recognition. Informatik Ak...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
A stochastic approach based on Dynamic Bayesian Networks (DBNs) is introduced for spoken language un...
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speec...
International audienceThanks to their different senses, human observers acquire multiple information...
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framewor...
International audienceThe language acquisition literature shows that children do not build their lex...
Wachsmuth S, Swadzba A. Probabilistic Scene Modeling for Situated Computer Vision. In: Cohn AG, Hogg...
Current cognitive models of spoken word recognition and comprehension are underspecified with respec...