This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising app...
There is a growing demand for embodied agents capable of engaging in face-to-face dialog using the s...
Speech-driven gesture generation is an emerging field within the domain of virtual human creation. T...
Gestures augment speech by performing a variety of communicative functions in humans and virtual age...
A large part of our communication is non-verbal: humans use non-verbal behaviors to express vari...
Co-speech gestures, gestures that accompany speech, play an important role in human communication. A...
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven ges...
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research inte...
Co-speech gestures, gestures that accompany speech, play an important role in human communication. A...
Voß H, Kopp S. Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide ...
Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing me...
The growing use of virtual humans in an array of applications such as games, human-computer interfac...
Modeling virtual agents with behavior style is one factor for personalizing human agent interaction....
Voß H, Kopp S. AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synt...
Bergmann K. Co-speech gesture generation for embodied agents and its effects on user evaluation. In:...
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program i...
There is a growing demand for embodied agents capable of engaging in face-to-face dialog using the s...
Speech-driven gesture generation is an emerging field within the domain of virtual human creation. T...
Gestures augment speech by performing a variety of communicative functions in humans and virtual age...
A large part of our communication is non-verbal: humans use non-verbal behaviors to express vari...
Co-speech gestures, gestures that accompany speech, play an important role in human communication. A...
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven ges...
With read-aloud speech synthesis achieving high naturalness scores, there is a growing research inte...
Co-speech gestures, gestures that accompany speech, play an important role in human communication. A...
Voß H, Kopp S. Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide ...
Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing me...
The growing use of virtual humans in an array of applications such as games, human-computer interfac...
Modeling virtual agents with behavior style is one factor for personalizing human agent interaction....
Voß H, Kopp S. AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synt...
Bergmann K. Co-speech gesture generation for embodied agents and its effects on user evaluation. In:...
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program i...
There is a growing demand for embodied agents capable of engaging in face-to-face dialog using the s...
Speech-driven gesture generation is an emerging field within the domain of virtual human creation. T...
Gestures augment speech by performing a variety of communicative functions in humans and virtual age...