In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural networks. Using a fully convolutional network with the proposed DCKs, high-quality talking-face video can be generated from multi-modal sources (i.e., unmatched audio and video) in real time, and our trained model is robust to different identities, head postures, and input audios. Our proposed DCKs are specially designed for audio-driven talking face video generation, leading to a simple yet effective end-to-end system. We also provide a theoretical analysis to interpret why DCKs work. Experimental results show that our method can generate high-quality talking-face video with background at 60 fps. Comparison and evaluation between our method and t...
Speech-driven facial animation is the process which uses speech signals to automatically synthesize ...
Speech is a rich biometric signal that contains information about the identity, gender and emotional...
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of sp...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
We describe a method for generating a video of a talking face. The method takes still images of the ...
We present a method for generating a video of a talking face. The method takes as inputs: (i) still ...
This paper presents a simple method for speech videos generation based on audio: given a piece of au...
We present a method for generating a video of a talking face. The method takes as inputs: (i) still ...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
We present a method to edit a target portrait footage by taking a sequence of audio as input to synt...
Speech-driven facial animation is the process which uses speech signals to automatically synthesize ...
Speech is a rich biometric signal that contains information about the identity, gender and emotional...
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of sp...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural netwo...
We describe a method for generating a video of a talking face. The method takes still images of the ...
We present a method for generating a video of a talking face. The method takes as inputs: (i) still ...
This paper presents a simple method for speech videos generation based on audio: given a piece of au...
We present a method for generating a video of a talking face. The method takes as inputs: (i) still ...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker ut...
We present a method to edit a target portrait footage by taking a sequence of audio as input to synt...
Speech-driven facial animation is the process which uses speech signals to automatically synthesize ...
Speech is a rich biometric signal that contains information about the identity, gender and emotional...
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of sp...