Given a video and a description sentence with one missing word, \u27source sentence\u27, Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence\u27s left fragment (before the blank) and the sentence\u27s right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence. For example, a missing word can be the first word or be in the middle of the ...
Localizing moments in the videos has been a new challenging task in the field of Computer Science to...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFI...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
© 1999-2012 IEEE. Recent progress in using long short-term memory (LSTM) for image captioning has mo...
This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacin...
We have witnessed the tremendous growth of videos over the Internet, where most of these videos are ...
Video captioning has been attracting broad research attention in multimedia community. However, most...
Recent progress has been made in using attention based encoder-decoder framework for video captionin...
© 2013 IEEE. Video captioning has been attracting broad research attention in the multimedia communi...
Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A ...
This paper studies the problem of temporal moment localization in a long untrimmed video using natur...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
Localizing moments in the videos has been a new challenging task in the field of Computer Science to...
Localizing moments in the videos has been a new challenging task in the field of Computer Science to...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFI...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
© 1999-2012 IEEE. Recent progress in using long short-term memory (LSTM) for image captioning has mo...
This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacin...
We have witnessed the tremendous growth of videos over the Internet, where most of these videos are ...
Video captioning has been attracting broad research attention in multimedia community. However, most...
Recent progress has been made in using attention based encoder-decoder framework for video captionin...
© 2013 IEEE. Video captioning has been attracting broad research attention in the multimedia communi...
Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A ...
This paper studies the problem of temporal moment localization in a long untrimmed video using natur...
This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-m...
Localizing moments in the videos has been a new challenging task in the field of Computer Science to...
Localizing moments in the videos has been a new challenging task in the field of Computer Science to...
Not every visual media production is equally retained in memory. Recent studies have shown that the ...
Automated analysis of videos for content understanding is one of the most challenging and well resea...