In query-by-semantic-example image retrieval, images are ranked by similarity of semantic descriptors. These descriptors are obtained by classifying each image with respect to a pre-defined vocabulary of semantic concepts. In this work, we consider the problem of improving the accuracy of semantic descriptors through cross-modal regularization, based on auxiliary text. A cross-modal regularizer, composed of three steps, is proposed. Training images and text are first mapped to a common semantic space. A regularization operator is then learned for each concept in the semantic vocabulary. This is an operator which maps the semantic descriptors of images labeled with that concept to the descriptors of the associated texts. A convex formulation...
Images without annotations are ubiquitous on the Internet, and recommending tags for them has become...
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptio...
The problem of joint modeling the text and image compo-nents of multimedia documents is studied. The...
In query-by-semantic-example image retrieval, images are ranked by similarity of semantic descriptor...
Semantic representations of images have been widely adopted in Computer Vision. A vocabulary of conc...
Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of ...
A novel image representation, termed semantic image representation, that incorporates contextual inf...
Cross-modal retrieval is an important field of research today because of the abundance of multi-medi...
Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval ...
The paradox of visual polysemia and concept polymor-phism has been a great challenge in the large sc...
Current cross modal retrieval systems are evaluated using R@K measure which does not leverage semant...
Nowadays, the heterogeneity gap of different modalities is the key problem for cross-modal retrieval...
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. I...
Most machine learning applications involve a domain shift between data on which a model has initiall...
Cross-modal retrieval is such a challenging topic that traditional global representations would fail...
Images without annotations are ubiquitous on the Internet, and recommending tags for them has become...
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptio...
The problem of joint modeling the text and image compo-nents of multimedia documents is studied. The...
In query-by-semantic-example image retrieval, images are ranked by similarity of semantic descriptor...
Semantic representations of images have been widely adopted in Computer Vision. A vocabulary of conc...
Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of ...
A novel image representation, termed semantic image representation, that incorporates contextual inf...
Cross-modal retrieval is an important field of research today because of the abundance of multi-medi...
Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval ...
The paradox of visual polysemia and concept polymor-phism has been a great challenge in the large sc...
Current cross modal retrieval systems are evaluated using R@K measure which does not leverage semant...
Nowadays, the heterogeneity gap of different modalities is the key problem for cross-modal retrieval...
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. I...
Most machine learning applications involve a domain shift between data on which a model has initiall...
Cross-modal retrieval is such a challenging topic that traditional global representations would fail...
Images without annotations are ubiquitous on the Internet, and recommending tags for them has become...
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptio...
The problem of joint modeling the text and image compo-nents of multimedia documents is studied. The...