We construct multi-modal concept repre-sentations by concatenating a skip-gram linguistic representation vector with a vi-sual concept representation vector com-puted using the feature extraction layers of a deep convolutional neural network (CNN) trained on a large labeled object recognition dataset. This transfer learn-ing approach brings a clear performance gain over features based on the traditional bag-of-visual-word approach. Experimen-tal results are reported on the WordSim353 and MEN semantic relatedness evaluation tasks. We use visual features computed us-ing either ImageNet or ESP Game images.
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Visual recognition is a problem of significant interest in computer vision. The current solution to ...
Multi-modal distributional models learn grounded representations for improved performance in semanti...
Learning image representations has been an interesting and challenging problem. When users upload im...
Learning image representations has been an interesting and challenging problem. When users upload im...
I present my work towards learning a better computer vision system that learns and generalizes objec...
With the rapidly growing number of images over the Internet, efficient scalable semantic image retri...
Image representation is a key component in visual recognition systems. In visual recognition problem...
<p>This paper explores the possibility to learn a semantically-relevant lexicon from images and spee...
Abstract Convolutional neural networks (CNN) have recently shown outstanding image classification pe...
Recognizing semantic category of objects and scenes captured using vision-based sensors is a challen...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Visual recognition is a problem of significant interest in computer vision. The current solution to ...
Multi-modal distributional models learn grounded representations for improved performance in semanti...
Learning image representations has been an interesting and challenging problem. When users upload im...
Learning image representations has been an interesting and challenging problem. When users upload im...
I present my work towards learning a better computer vision system that learns and generalizes objec...
With the rapidly growing number of images over the Internet, efficient scalable semantic image retri...
Image representation is a key component in visual recognition systems. In visual recognition problem...
<p>This paper explores the possibility to learn a semantically-relevant lexicon from images and spee...
Abstract Convolutional neural networks (CNN) have recently shown outstanding image classification pe...
Recognizing semantic category of objects and scenes captured using vision-based sensors is a challen...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...