This paper proposes two network architectures to perform video generation from captions using Variational Autoencoders. We adopt a new perspective towards video generation where we use attention as well as allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architectures’ ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. Our second network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network’s ability to learn a lat...
Video action recognition has gained much attention in recent years by the research community for its...
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
In the modern era, image captioning has become one of the most widely required tools. Moreover, ther...
This paper proposes a network architecture to perform variable length semantic video generation usin...
Video captioning refers to the process of conveying information of video clips through automatically...
Generating videos from text has proven to be a significant challenge for existing generative models....
Jia X., ''Generating text and images with deep learning'', Proefschrift voorgedragen tot het behalen...
With the rise of deep learning approaches, great progress has been achieved on object recognition ta...
We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for ...
Video captioning refers to the task of generating a natural language sentence that explains the cont...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
Automatically generating natural language description for video is an extremely complicated and chal...
In this thesis, we propose novel deep learning algorithms for the vision and language tasks, includi...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
Applying machine learning (ML), and especially deep learning, to understand visual content is becomi...
Video action recognition has gained much attention in recent years by the research community for its...
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
In the modern era, image captioning has become one of the most widely required tools. Moreover, ther...
This paper proposes a network architecture to perform variable length semantic video generation usin...
Video captioning refers to the process of conveying information of video clips through automatically...
Generating videos from text has proven to be a significant challenge for existing generative models....
Jia X., ''Generating text and images with deep learning'', Proefschrift voorgedragen tot het behalen...
With the rise of deep learning approaches, great progress has been achieved on object recognition ta...
We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for ...
Video captioning refers to the task of generating a natural language sentence that explains the cont...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
Automatically generating natural language description for video is an extremely complicated and chal...
In this thesis, we propose novel deep learning algorithms for the vision and language tasks, includi...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
Applying machine learning (ML), and especially deep learning, to understand visual content is becomi...
Video action recognition has gained much attention in recent years by the research community for its...
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
In the modern era, image captioning has become one of the most widely required tools. Moreover, ther...