This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more co...
Most machine learning applications involve a domain shift between data on which a model has initiall...
The Contrastive Language-Image Pre-training (CLIP) Model is a recently proposed large-scale pre-trai...
Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during tra...
Recent advances in contrastive representation learning over paired image-text data have led to model...
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regre...
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantica...
Recently, the cross-modal pre-training task has been a hotspot because of its wide application in va...
Cross-modal attention mechanisms have been widely applied to the image-text matching task and have a...
This paper presents contrastive-tuning, a simple method employing contrastive training to align imag...
Large vision-language representation learning models like CLIP have demonstrated impressive performa...
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal da...
Contrastive learning is a form of distance learning that aims to learn invariant features from two r...
Most machine learning applications involve a domain shift between data on which a model has initiall...
Recent Vision-Language Pre-trained (VLP) models based on dual encoder have attracted extensive atten...
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a...
Most machine learning applications involve a domain shift between data on which a model has initiall...
The Contrastive Language-Image Pre-training (CLIP) Model is a recently proposed large-scale pre-trai...
Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during tra...
Recent advances in contrastive representation learning over paired image-text data have led to model...
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regre...
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantica...
Recently, the cross-modal pre-training task has been a hotspot because of its wide application in va...
Cross-modal attention mechanisms have been widely applied to the image-text matching task and have a...
This paper presents contrastive-tuning, a simple method employing contrastive training to align imag...
Large vision-language representation learning models like CLIP have demonstrated impressive performa...
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal da...
Contrastive learning is a form of distance learning that aims to learn invariant features from two r...
Most machine learning applications involve a domain shift between data on which a model has initiall...
Recent Vision-Language Pre-trained (VLP) models based on dual encoder have attracted extensive atten...
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a...
Most machine learning applications involve a domain shift between data on which a model has initiall...
The Contrastive Language-Image Pre-training (CLIP) Model is a recently proposed large-scale pre-trai...
Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during tra...