Compared to natural images, understanding scientific figures is particularly hard for machines. However, there is a valuable source of information in scientific literature that until now has remained untapped: the correspondence between a figure and its caption. In this paper we investigate what can be learnt by looking at a large number of figures and reading their captions, and introduce a figure-caption correspondence learning task that makes use of our observations. Training visual and language networks without supervision other than pairs of unconstrained figures and captions is shown to successfully solve this task. We also show that transferring lexical and semantic knowledge from a knowledge graph significantly enriches the resultin...
With the advancement in deep learning, the consolidation of text classification and image processing...
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelli...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
Visual attention plays an important role to understand images and demonstrates its effectiveness in ...
In this thesis, we propose novel deep learning algorithms for the vision and language tasks, includi...
Attention is a substantial mechanism for human to process massive data. It omits the trivial parts a...
Image captioning and visual language grounding are two important tasks for image understanding, but ...
Understanding images requires rich background knowledge that is not often written down and hard for ...
Visual attention plays an important role to understand images and demonstrates its effectiveness in ...
Video captioning has become a broad and interesting research area. Attention-based encoder-decoder m...
With the advancement in deep learning, the consolidation of text classification and image processing...
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelli...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
Visual attention plays an important role to understand images and demonstrates its effectiveness in ...
In this thesis, we propose novel deep learning algorithms for the vision and language tasks, includi...
Attention is a substantial mechanism for human to process massive data. It omits the trivial parts a...
Image captioning and visual language grounding are two important tasks for image understanding, but ...
Understanding images requires rich background knowledge that is not often written down and hard for ...
Visual attention plays an important role to understand images and demonstrates its effectiveness in ...
Video captioning has become a broad and interesting research area. Attention-based encoder-decoder m...
With the advancement in deep learning, the consolidation of text classification and image processing...
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelli...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...