While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss wit...
Using deep learning, computer vision now rivals people at object recognition and detection, opening ...
Recent advancements in multimodal techniques open exciting possibilities for models excelling in div...
The scarcity of data presents a critical obstacle to the efficacy of medical visionlanguage pre-trai...
While several benefits were realized for multilingual vision-language pretrained models, recent benc...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models t...
Prior work on multilingual question answering has mostly focused on using large multilingual pre-tra...
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open retrieval Qu...
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Cross-lingual Machine Reading Comprehension (xMRC) is a challenging task due to the lack of training...
Vision language pre-training aims to learn alignments between vision and language from a large amoun...
Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural ...
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great...
Multilingual language models exhibit better performance for some languages than for others (Singh et...
Using deep learning, computer vision now rivals people at object recognition and detection, opening ...
Recent advancements in multimodal techniques open exciting possibilities for models excelling in div...
The scarcity of data presents a critical obstacle to the efficacy of medical visionlanguage pre-trai...
While several benefits were realized for multilingual vision-language pretrained models, recent benc...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models t...
Prior work on multilingual question answering has mostly focused on using large multilingual pre-tra...
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open retrieval Qu...
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Cross-lingual Machine Reading Comprehension (xMRC) is a challenging task due to the lack of training...
Vision language pre-training aims to learn alignments between vision and language from a large amoun...
Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural ...
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great...
Multilingual language models exhibit better performance for some languages than for others (Singh et...
Using deep learning, computer vision now rivals people at object recognition and detection, opening ...
Recent advancements in multimodal techniques open exciting possibilities for models excelling in div...
The scarcity of data presents a critical obstacle to the efficacy of medical visionlanguage pre-trai...