As larger multimodal datasets are becoming available on the web, the possibility for better, more human-like multimodal models grows. My research goal is to evaluate what multimodality brings to machine representation of data, especially when it comes to generalizing in one or two modalities (image and/or text), as well as to find ways of improving the quality of the latent space of multimodal algorithms. Bigger datasets and larger computational power enable better algorithms to be developed, but in this project, I aim at using as little data as possible, with as few annotations as possible, to improve the multimodal representation of pretrained algorithms. There has been great progress in multimodal dataset availability, mostly due to the ...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and...
Exploiting multimedia documents leads to representation problems of the textual and visual informati...
This thesis deals with multimodal image annotation in the context of social media. We seek to take a...
We are currently experiencing an exceptional growth of visual data, for example, millions of photos ...
This dissertation delves into the use of textual metadata for image understanding. We seek to exploi...
When learning about the world, inputs can come in all sorts of ways: images when we look around, tex...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
Recent technological advances in the acquisition of multimedia data have led to an exponential growt...
One of the key factors driving the success of machine learning for scene understanding is the develo...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Les mécanismes de compréhension chez l'être humain sont par essence multimodaux. Comprendre le monde...
The attention mechanism is an important part of the neural machine translation (NMT) where it was re...
Multimodal learning involves the use of multiple senses (touch, visual, auditory, etc.) during the l...
L'interaction entre le langage et la vision reste relativement peu explorée malgré un intérêt grandi...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and...
Exploiting multimedia documents leads to representation problems of the textual and visual informati...
This thesis deals with multimodal image annotation in the context of social media. We seek to take a...
We are currently experiencing an exceptional growth of visual data, for example, millions of photos ...
This dissertation delves into the use of textual metadata for image understanding. We seek to exploi...
When learning about the world, inputs can come in all sorts of ways: images when we look around, tex...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
Recent technological advances in the acquisition of multimedia data have led to an exponential growt...
One of the key factors driving the success of machine learning for scene understanding is the develo...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Les mécanismes de compréhension chez l'être humain sont par essence multimodaux. Comprendre le monde...
The attention mechanism is an important part of the neural machine translation (NMT) where it was re...
Multimodal learning involves the use of multiple senses (touch, visual, auditory, etc.) during the l...
L'interaction entre le langage et la vision reste relativement peu explorée malgré un intérêt grandi...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and...
Exploiting multimedia documents leads to representation problems of the textual and visual informati...