Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
International audienceWe propose ``Areas of Attention'', a novel attention-based model for automatic...
Spatial understanding is crucial in many real-world problems, yet little progress has been made towa...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
Jamieson M, Fazly A, Stevenson S, Dickinson S, Wachsmuth S. Using Language to Learn Structured Appea...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
We present a method of recognizing three-dimensional objects in intensity images of cluttered scene...
Compared to natural images, understanding scientific figures is particularly hard for machines. Howe...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
This paper presents the integration of natural language processing and computer vision to improve th...
Computer vision is moving from predicting discrete, categorical labels to generating rich descriptio...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
International audienceWe propose ``Areas of Attention'', a novel attention-based model for automatic...
Spatial understanding is crucial in many real-world problems, yet little progress has been made towa...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
Jamieson M, Fazly A, Stevenson S, Dickinson S, Wachsmuth S. Using Language to Learn Structured Appea...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
This paper presents a novel approach for automatically generating image descriptions: visual detecto...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
We present a method of recognizing three-dimensional objects in intensity images of cluttered scene...
Compared to natural images, understanding scientific figures is particularly hard for machines. Howe...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
This paper presents the integration of natural language processing and computer vision to improve th...
Computer vision is moving from predicting discrete, categorical labels to generating rich descriptio...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
International audienceWe propose ``Areas of Attention'', a novel attention-based model for automatic...
Spatial understanding is crucial in many real-world problems, yet little progress has been made towa...