CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

Yao, Yuan
Zhang, Ao
Zhang, Zhengyan
Liu, Zhiyuan
Chua, Tat-Seng
Sun, Maosong

Publication date

May 2022

Language

English

Abstract

Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural language in image data, facilitating a broad variety of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VL-PTMs for downstream tasks. To address the challenge, we present Cross-modal Prompt Tuning (CPT, alternatively, Colorful Prompt Tuning), a novel paradigm for tuning VL-PTMs, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT en...

Extracted data

We use cookies to provide a better user experience.

Data Protection

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

Abstract

Extracted data

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

Abstract

Extracted data

Related items

Related items