In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phrase artifact attribute, we retrieve the corresponding image fragment it specifies. To this end, we propose a weakly supervised alignment model where the correspondence between the input training visual and textual fragments is not known but their corresponding units that refer to the same artwork are treated as a positive pair. The model exploits the latent alignment between fragments across modalities using attention mechanisms by first projecting them into a shared common semantic space; the...
Cross-modal retrieval has been attracting increasing attention because of the explosion of multi-mod...
Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g....
The problem of cross-modal retrieval from multimedia repositories is considered. This problem addres...
Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval ...
Multimodal machine learning involving textual and visual data is a fundamental research topic in the...
Cross-modal retrieval has attracted widespread attention in many cross-media similarity search appli...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. I...
The problem of joint modeling the text and image compo-nents of multimedia documents is studied. The...
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data ...
The goal of cross-modal retrieval is that the user gives any sample as a query sample, and the syste...
Cross-modal retrieval is an important field of research today because of the abundance of multi-medi...
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. ...
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an ...
Cross-modal retrieval has been attracting increasing attention because of the explosion of multi-mod...
Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g....
The problem of cross-modal retrieval from multimedia repositories is considered. This problem addres...
Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval ...
Multimodal machine learning involving textual and visual data is a fundamental research topic in the...
Cross-modal retrieval has attracted widespread attention in many cross-media similarity search appli...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. I...
The problem of joint modeling the text and image compo-nents of multimedia documents is studied. The...
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data ...
The goal of cross-modal retrieval is that the user gives any sample as a query sample, and the syste...
Cross-modal retrieval is an important field of research today because of the abundance of multi-medi...
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. ...
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an ...
Cross-modal retrieval has been attracting increasing attention because of the explosion of multi-mod...
Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g....
The problem of cross-modal retrieval from multimedia repositories is considered. This problem addres...