In this paper we report on our experiments on aligning names and faces as found in images and captions of online news websites. Developing accurate technologies for linking names and faces is valuable when retrieving or mining information from multimedia collections. We perform exhaustive and systematic experiments exploiting the (a)symmetry between the visual and textual modalities. This leads to different schemes for assigning names to the faces, assigning faces to the names, and establishing name-face link pairs. On top of that, we investigate generic approaches to the use of textual and visual structural information to predict the presence of the corresponding entity in the other modality. The proposed methods are completely unsupervis...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
International audienceWith the advent of end-to-end deep learning approaches in machine translation,...
Learning word alignments between parallel sentence pairs is an important task in Statistical Machine...
In this paper we report on our experiments on linking names and faces as found in images and caption...
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent yea...
We show that a large and realistic face dataset can be built from news photographs and their associa...
We have been developing Name-It, a system that associates faces and names in news videos. First, as ...
We have been developing Name-It, a system that associates faces and names in news videos. First, as ...
Abstract. Facial landmark localization plays an important role for many computer vision tasks, e.g.,...
We propose an unsupervised learning algorithm for automatically inferring the mappings between Engli...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Unsupervised joint alignment of images has been demonstrated to improve performance on recognition t...
The invited presentation motivates research for content alignment in text and images. It describes p...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
International audienceWith the advent of end-to-end deep learning approaches in machine translation,...
Learning word alignments between parallel sentence pairs is an important task in Statistical Machine...
In this paper we report on our experiments on linking names and faces as found in images and caption...
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent yea...
We show that a large and realistic face dataset can be built from news photographs and their associa...
We have been developing Name-It, a system that associates faces and names in news videos. First, as ...
We have been developing Name-It, a system that associates faces and names in news videos. First, as ...
Abstract. Facial landmark localization plays an important role for many computer vision tasks, e.g.,...
We propose an unsupervised learning algorithm for automatically inferring the mappings between Engli...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Unsupervised joint alignment of images has been demonstrated to improve performance on recognition t...
The invited presentation motivates research for content alignment in text and images. It describes p...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
International audienceWith the advent of end-to-end deep learning approaches in machine translation,...
Learning word alignments between parallel sentence pairs is an important task in Statistical Machine...