Deep vision multimodal learning aims at combining deep visual representation learning with other modalities, such as text, sound, and data collected from other sensors. With the fast development of deep learning, vision multimodal learning has gained much interest from the community. This paper reviews the types of architectures used in multimodal learning, including feature extraction, modality aggregation, and multimodal loss functions. Then, we discuss several learning paradigms such as supervised, semi-supervised, self-supervised, and transfer learning. We also introduce several practical challenges such as missing modalities and noisy modalities. Several applications and benchmarks on vision tasks are listed to help researchers gain a ...
International audienceThis paper proposes a novel multimodal fusion approach, aiming to produce best...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
2019-01-29Multimodal reasoning focuses on learning the correlation between different modalities pres...
International audienceIn recent years, deep learning algorithms have rapidly revolutionized artifici...
A fundamental goal of computer vision is to discover the semantic information within a given scene, ...
Multitask Learning is a novel machine learning approach that learns each problem better by also lear...
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and...
Deep learning has been overwhelmingly successful in computer vision (CV), natural language processin...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Deep learning has achieved state-of-the-art performances in several research applications nowadays: ...
AbstractCurrent Machine learning algorithms are highly dependent on manually designing features and ...
Abstract The focus of this survey is on the analysis of two modalities of multimodal deep learning:...
Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous ...
Computer vision is the science related to teaching machines to see and understand digital images or ...
Large-scale pretrained foundation models have been an emerging paradigm for building artificial inte...
International audienceThis paper proposes a novel multimodal fusion approach, aiming to produce best...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
2019-01-29Multimodal reasoning focuses on learning the correlation between different modalities pres...
International audienceIn recent years, deep learning algorithms have rapidly revolutionized artifici...
A fundamental goal of computer vision is to discover the semantic information within a given scene, ...
Multitask Learning is a novel machine learning approach that learns each problem better by also lear...
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and...
Deep learning has been overwhelmingly successful in computer vision (CV), natural language processin...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Deep learning has achieved state-of-the-art performances in several research applications nowadays: ...
AbstractCurrent Machine learning algorithms are highly dependent on manually designing features and ...
Abstract The focus of this survey is on the analysis of two modalities of multimodal deep learning:...
Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous ...
Computer vision is the science related to teaching machines to see and understand digital images or ...
Large-scale pretrained foundation models have been an emerging paradigm for building artificial inte...
International audienceThis paper proposes a novel multimodal fusion approach, aiming to produce best...
This thesis focuses on proposing and addressing various tasks in the field of vision and language, a...
2019-01-29Multimodal reasoning focuses on learning the correlation between different modalities pres...