Object detection, visual relationship detection, and image captioning, which are the three main visual tasks in scene understanding, are highly correlated and correspond to different semantic levels of scene image. However, the existing captioning methods convert the extracted image features into description text, and the obtained results are not satisfactory. In this work, we propose a Multi-level Semantic Context Information (MSCI) network with an overall symmetrical structure to leverage the mutual connections across the three different semantic layers and extract the context information between them, to solve jointly the three vision tasks for achieving the accurate and comprehensive description of the scene image. The model uses a feat...
Visual semantic information comprises two important parts: the meaning of each visual semantic unit ...
Exploiting relationships between objects for image and video captioning has received increasing atte...
Image captioning has been recently gaining a lot of attention thanks to the impressive achievements ...
In this paper, we propose a novel encoding framework to learn the multi-scale context information of...
Existing image captioning approaches fail to generate fine-grained captions due to the lack of rich ...
Image caption enables computers to generate a text description of images automatically. However, the...
Image captioning is the task of automatically generating a description of an image. Traditional imag...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
In this work, we present a novel multi-scale feature fusion network (M-FFN) for image captioning tas...
In the modern era, image captioning has become one of the most widely required tools. Moreover, ther...
Dense captioning is a challenging task which not only detects visual elements in images but also gen...
As one of the most intelligent beings on the planet, we are equipped with the most powerful visual a...
Recently, attention mechanism has been successfully applied in image captioning, but the existing at...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
| openaire: EC/H2020/780069/EU//MeMADDense captioning (DC), which provides a comprehensive context u...
Visual semantic information comprises two important parts: the meaning of each visual semantic unit ...
Exploiting relationships between objects for image and video captioning has received increasing atte...
Image captioning has been recently gaining a lot of attention thanks to the impressive achievements ...
In this paper, we propose a novel encoding framework to learn the multi-scale context information of...
Existing image captioning approaches fail to generate fine-grained captions due to the lack of rich ...
Image caption enables computers to generate a text description of images automatically. However, the...
Image captioning is the task of automatically generating a description of an image. Traditional imag...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
In this work, we present a novel multi-scale feature fusion network (M-FFN) for image captioning tas...
In the modern era, image captioning has become one of the most widely required tools. Moreover, ther...
Dense captioning is a challenging task which not only detects visual elements in images but also gen...
As one of the most intelligent beings on the planet, we are equipped with the most powerful visual a...
Recently, attention mechanism has been successfully applied in image captioning, but the existing at...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
| openaire: EC/H2020/780069/EU//MeMADDense captioning (DC), which provides a comprehensive context u...
Visual semantic information comprises two important parts: the meaning of each visual semantic unit ...
Exploiting relationships between objects for image and video captioning has received increasing atte...
Image captioning has been recently gaining a lot of attention thanks to the impressive achievements ...