An urgent limitation in current Image Captioning models is their tendency to produce generic captions that avoid the interesting detail which makes each image unique. To address this limitation, we propose an approach that enforces a stronger alignment between image regions and specific segments of text. The model architecture is composed of a visual region proposer, a region-order planner and a region-guided caption generator. The region-guided caption generator incorporates a novel information gate which allows visual and textual input of different frequencies and dimensionalities in a Recurrent Neural Network
Automatic image captioning, a highly challenging research problem, aims to understand and describe t...
We hypothesize that end-to-end neural image captioning systems work seemingly well because they expl...
Dense captioning provides detailed captions of complex visual scenes. While a number of successes ha...
An urgent limitation in current Image Captioning models is their tendency to produce generic caption...
Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wher...
Current captioning approaches can describe images using black-box architectures whose behavior is ha...
| openaire: EC/H2020/780069/EU//MeMADDense captioning (DC), which provides a comprehensive context u...
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world ...
International audienceWe propose ``Areas of Attention'', a novel attention-based model for automatic...
Describing the content of an image is a challenging task. To enable detailed description, it require...
Image captioning in recent research generally focuses upon small, relatively high-level captions. Th...
Two recent approaches have achieved state-of-the-art results in image caption-ing. The first uses a ...
Image captioning is the task of automatically generating a description of an image. Traditional imag...
A large amount of images with accompanying text captions are available on the Internet. These are va...
A methodology is described for the generation of relevant captions for images of an extensiv...
Automatic image captioning, a highly challenging research problem, aims to understand and describe t...
We hypothesize that end-to-end neural image captioning systems work seemingly well because they expl...
Dense captioning provides detailed captions of complex visual scenes. While a number of successes ha...
An urgent limitation in current Image Captioning models is their tendency to produce generic caption...
Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wher...
Current captioning approaches can describe images using black-box architectures whose behavior is ha...
| openaire: EC/H2020/780069/EU//MeMADDense captioning (DC), which provides a comprehensive context u...
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world ...
International audienceWe propose ``Areas of Attention'', a novel attention-based model for automatic...
Describing the content of an image is a challenging task. To enable detailed description, it require...
Image captioning in recent research generally focuses upon small, relatively high-level captions. Th...
Two recent approaches have achieved state-of-the-art results in image caption-ing. The first uses a ...
Image captioning is the task of automatically generating a description of an image. Traditional imag...
A large amount of images with accompanying text captions are available on the Internet. These are va...
A methodology is described for the generation of relevant captions for images of an extensiv...
Automatic image captioning, a highly challenging research problem, aims to understand and describe t...
We hypothesize that end-to-end neural image captioning systems work seemingly well because they expl...
Dense captioning provides detailed captions of complex visual scenes. While a number of successes ha...