Transformer reasoning network for image-text matching and retrieval

Messina Nicola
Falchi Fabrizio
Esuli Andrea
Amato Giuseppe

Publication date

January 2021

DOI

10.1109/ICPR48806.2021.9413172

Abstract

Image-text matching is an interesting and fascinating task in modern AI research. Despite the evolution of deep-learning-based image and text processing systems, multi-modal matching remains a challenging problem. In this work, we consider the problem of accurate image-text matching for the task of multi-modal large-scale information retrieval. State-of-the-art results in image-text matching are achieved by inter-playing image and text features from the two different processing pipelines, usually using mutual attention mechanisms. However, this invalidates any chance to extract separate visual and textual features needed for later indexing steps in large-scale retrieval systems. In this regard, we introduce the Transformer Encoder Reasoning...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Transformer reasoning network for image-text matching and retrieval

Abstract

Extracted data

Transformer reasoning network for image-text matching and retrieval

Abstract

Extracted data

Related items

Related items