Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learne...
Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities...
Image-text multimodal representation learning aligns data across modalities and enables important me...
In practical machine learning settings, there often exist relations or links between data from diffe...
Learning representations of multimodal data that are both informative and robust to missing modaliti...
We present modality gap, an intriguing geometric phenomenon of the representation space of multi-mod...
Some self-supervised cross-modal learning approaches have recently demonstrated the potential of ima...
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantica...
Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-moda...
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. ...
A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimoda...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
A key competence for open-ended learning is the formation of increasingly abstract representations u...
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed...
Making each modality in multi-modal data contribute is of vital importance to learning a versatile m...
Many real-world problems are inherently multimodal, from the communicative modalities humans use to ...
Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities...
Image-text multimodal representation learning aligns data across modalities and enables important me...
In practical machine learning settings, there often exist relations or links between data from diffe...
Learning representations of multimodal data that are both informative and robust to missing modaliti...
We present modality gap, an intriguing geometric phenomenon of the representation space of multi-mod...
Some self-supervised cross-modal learning approaches have recently demonstrated the potential of ima...
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantica...
Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-moda...
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. ...
A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimoda...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
A key competence for open-ended learning is the formation of increasingly abstract representations u...
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed...
Making each modality in multi-modal data contribute is of vital importance to learning a versatile m...
Many real-world problems are inherently multimodal, from the communicative modalities humans use to ...
Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities...
Image-text multimodal representation learning aligns data across modalities and enables important me...
In practical machine learning settings, there often exist relations or links between data from diffe...