This paper presents a new learning-based approach to speech synthesis that achieves mouth movements with rich and expressive articulation for novel audio input. From a database of 3D triphone motions, our algorithm picks the optimal sequences based on a triphone similarity measure, and concatenates them to create new utterances that include coarticulation effects. By using a Locally Linear Embed-ding (LLE) representation of feature points on 3D scans, we propose a model that defines a measure of similarity among visemes, and a system of viseme categories, which are used to define triphone substitution rules and a cost function. Moreover, we compute deformation vectors for several facial expressions, allowing expression variation to be smoot...
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audi...
Face to face dialogue is the most natural mode of communication between hu-mans. The combination of ...
This paper presents a framework for speech-driven synthe-sis of real faces from a corpus of 3D video...
This paper presents a representation of visemes that defines a measure of similarity between differe...
This paper presents a novel approach for the generation of realistic speech synchronized 3D facial a...
In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit ...
We describe a method for the synthesis of visual speech movements using a hybrid unit selection/mod...
Introduction This sketch concerns the animation of facial movement during speech production. In thi...
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this...
In this paper we describe a parameterisation of lip movements which maintains the dynamic structure ...
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two appro...
In this paper, we present a new approach that generates synthetic mouth articulations from an audio...
This paper proposes and compares a range of methods to improve the naturalness of visual speech synt...
We present a framework for speech-driven synthesis of real faces from a corpus of 3D video of a pers...
Figure 1: Some synthesized frames of singing a part of “hunter ” music (by dido). While speech anima...
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audi...
Face to face dialogue is the most natural mode of communication between hu-mans. The combination of ...
This paper presents a framework for speech-driven synthe-sis of real faces from a corpus of 3D video...
This paper presents a representation of visemes that defines a measure of similarity between differe...
This paper presents a novel approach for the generation of realistic speech synchronized 3D facial a...
In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit ...
We describe a method for the synthesis of visual speech movements using a hybrid unit selection/mod...
Introduction This sketch concerns the animation of facial movement during speech production. In thi...
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this...
In this paper we describe a parameterisation of lip movements which maintains the dynamic structure ...
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two appro...
In this paper, we present a new approach that generates synthetic mouth articulations from an audio...
This paper proposes and compares a range of methods to improve the naturalness of visual speech synt...
We present a framework for speech-driven synthesis of real faces from a corpus of 3D video of a pers...
Figure 1: Some synthesized frames of singing a part of “hunter ” music (by dido). While speech anima...
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audi...
Face to face dialogue is the most natural mode of communication between hu-mans. The combination of ...
This paper presents a framework for speech-driven synthe-sis of real faces from a corpus of 3D video...