© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFIB) has attracted many research attentions. Given a pair of movie clip and description with one blank word as input, MovieFIB aims to automatically predict the blank word. Because of the advantage in processing sequence data, Long-Short Term Memory (LSTM) has been used as a key component in existing MovieFIB methods to generate representations of videos and descriptions. However, most of these methods fail to emphasize the salient parts of videos. To address this problem, in this paper we propose to use a novel LSTM network called LSTM with Linguistic gate (LSTMwL), which exploits adaptive temporal attention for MovieFIB. Specifically, we firs...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automatically describing video content with natural language is a fundamental challenging that has r...
Analyzing and understanding human actions in long-range videos has promising applications, such as v...
Given a video and a description sentence with one missing word, \u27source sentence\u27, Video-Fill-...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
© 1999-2012 IEEE. Recent progress in using long short-term memory (LSTM) for image captioning has mo...
Video captioning has been attracting broad research attention in multimedia community. However, most...
© 2013 IEEE. Video captioning has been attracting broad research attention in the multimedia communi...
Recent progress has been made in using attention based encoder-decoder framework for video captionin...
We use Long Short Term Memory (LSTM) networks to learn representations of video se-quences. Our mode...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
In this paper, we propose a novel approach to video captioning based on adversarial learning and lon...
Movies provide us with a mass of visual content as well as attracting stories. Existing methods have...
Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A ...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automatically describing video content with natural language is a fundamental challenging that has r...
Analyzing and understanding human actions in long-range videos has promising applications, such as v...
Given a video and a description sentence with one missing word, \u27source sentence\u27, Video-Fill-...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
© 1999-2012 IEEE. Recent progress in using long short-term memory (LSTM) for image captioning has mo...
Video captioning has been attracting broad research attention in multimedia community. However, most...
© 2013 IEEE. Video captioning has been attracting broad research attention in the multimedia communi...
Recent progress has been made in using attention based encoder-decoder framework for video captionin...
We use Long Short Term Memory (LSTM) networks to learn representations of video se-quences. Our mode...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
In this paper, we propose a novel approach to video captioning based on adversarial learning and lon...
Movies provide us with a mass of visual content as well as attracting stories. Existing methods have...
Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A ...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automatically describing video content with natural language is a fundamental challenging that has r...
Analyzing and understanding human actions in long-range videos has promising applications, such as v...