Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description. This retrieval task is characterized with both modal heterogeneity and fine-grained matching. To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment. However, most existing approaches only consider the alignment confined at their individual scales, e.g., an image-sentence or a region-phrase scale. Such a strategy adopts the presumable alignment in feature extraction, while overlooking the cross-scale alignment, e.g., image-phrase. In this paper, we present a transforme...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly,...
Text-based person search aims to retrieve the corresponding person images in an image database by vi...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
© 2018, Springer Nature Switzerland AG. We consider the problem of person search in unconstrained sc...
Cross-modal retrieval has attracted widespread attention in many cross-media similarity search appli...
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an ...
Recent advances of person re-identification have well advocated the usage of human body cues to boos...
Recent advances of person re-identification have well advocated the usage of human body cues to boos...
Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched ta...
In this paper we report on our experiments on aligning names and faces as found in images and captio...
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent yea...
In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly,...
Text-based person search aims to retrieve the corresponding person images in an image database by vi...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal ...
© 2018, Springer Nature Switzerland AG. We consider the problem of person search in unconstrained sc...
Cross-modal retrieval has attracted widespread attention in many cross-media similarity search appli...
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an ...
Recent advances of person re-identification have well advocated the usage of human body cues to boos...
Recent advances of person re-identification have well advocated the usage of human body cues to boos...
Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched ta...
In this paper we report on our experiments on aligning names and faces as found in images and captio...
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent yea...
In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Cross-modal retrieval is an important functionality in modern search engines, as it increases the us...
Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly,...