This paper presents the integration of natural language processing and computer vision to improve the syntax of the language generated when describing objects in images. The goal was to not only understand the objects in an image, but the interactions and activities occurring between the objects. We implemented a multi-modal neural network combining convolutional and recurrent neural network architectures to create a model that can maximize the likelihood of word combinations given a training image. The outcome was an image captioning model that leveraged transfer learning techniques for architecture components. Our novelty was to quantify the effectiveness of transfer learning schemes for encoders and decoders to qualify which were the bes...
We introduce two multimodal neural lan-guage models: models of natural language that can be conditio...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
Developing intelligent agents that can perceive and understand the rich visual world around us has b...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
[ANGLÈS] The field of computer vision has radically evolved in the last few years due to the success...
[ANGLÈS] The field of computer vision has radically evolved in the last few years due to the success...
As one of the most intelligent beings on the planet, we are equipped with the most powerful visual a...
We introduce two multimodal neural lan-guage models: models of natural language that can be conditio...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
Throughout the thesis, I demonstrate how each of the proposed methods can bridge the gap between ima...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
Developing intelligent agents that can perceive and understand the rich visual world around us has b...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
[ANGLÈS] The field of computer vision has radically evolved in the last few years due to the success...
[ANGLÈS] The field of computer vision has radically evolved in the last few years due to the success...
As one of the most intelligent beings on the planet, we are equipped with the most powerful visual a...
We introduce two multimodal neural lan-guage models: models of natural language that can be conditio...
Automatic image caption prediction is a challenging task in natural language processing. Most of the...
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of obje...