It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence manner via Recurrent Neural Network (RNN). Nevertheless, the training of RNN still suffers to some degree from vanishing/exploding gradient problem, making the optimization difficult. Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations. In this paper, we present a novel design — Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both en...
Automatically generating natural language description for video is an extremely complicated and chal...
In this paper, we propose a novel approach to video captioning based on adversarial learning and lon...
IEEE Visual narrating focuses on generating semantic descriptions to summarize visual content of ima...
© 2016 IEEE. Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNe...
The temporal events in video sequences often have long-term dependencies which are difficult to be h...
Audio captioning aims to automatically generate a natural language description of an audio clip. Mos...
The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, si...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Ne...
Abstract: This paper discusses an efficient approach to captioning a given image using a combination...
Video captioning, in essential, is a complex natural process, which is affected by various uncertain...
We address the challenge of learning good video representations by explicitly modeling the relations...
Abstract Dense video captioning (DVC) detects multiple events in an input video and generates natura...
Video captioning refers to the task of generating a natural language sentence that explains the cont...
Video captioning via encoder–decoder structures is a successful sentence generation method. In addit...
Automatically generating natural language description for video is an extremely complicated and chal...
In this paper, we propose a novel approach to video captioning based on adversarial learning and lon...
IEEE Visual narrating focuses on generating semantic descriptions to summarize visual content of ima...
© 2016 IEEE. Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNe...
The temporal events in video sequences often have long-term dependencies which are difficult to be h...
Audio captioning aims to automatically generate a natural language description of an audio clip. Mos...
The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, si...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Ne...
Abstract: This paper discusses an efficient approach to captioning a given image using a combination...
Video captioning, in essential, is a complex natural process, which is affected by various uncertain...
We address the challenge of learning good video representations by explicitly modeling the relations...
Abstract Dense video captioning (DVC) detects multiple events in an input video and generates natura...
Video captioning refers to the task of generating a natural language sentence that explains the cont...
Video captioning via encoder–decoder structures is a successful sentence generation method. In addit...
Automatically generating natural language description for video is an extremely complicated and chal...
In this paper, we propose a novel approach to video captioning based on adversarial learning and lon...
IEEE Visual narrating focuses on generating semantic descriptions to summarize visual content of ima...