While our representation of the world is shaped by our perceptions, our languages, and our interactions, they have traditionally been distinct fields of study in machine learning. Fortunately, this partitioning started opening up with the recent advents of deep learning methods, which standardized raw feature extraction across communities. However, multimodal neural architectures are still at their beginning, and deep reinforcement learning is often limited to constrained environments. Yet, we ideally aim to develop large-scale multimodal and interactive models towards correctly apprehending the complexity of the world. As a first milestone, this thesis focuses on visually grounded language learning for three reasons (i) they are both well-...
Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voi...
Grounding natural language onto real-world perception is a fundamental challenge to empower various ...
In recent years, deep learning methods allowed the creation of neural models that are able to proces...
Alors que nous nous représentons le monde au travers de nos sens, de notre langage et de nos interac...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
Most human language understanding is grounded in perception. There is thus growing interest in combi...
Code is available at http://ibm.biz/VisualHintsWe present VISUALHINTS, a novel environment for multi...
Code is available at http://ibm.biz/VisualHintsWe present VISUALHINTS, a novel environment for multi...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Multimodal conversation modeling is an important and challenging problem when building conversationa...
The world around us involves multiple modalities -- we see objects, feel texture, hear sounds, smell...
L'interaction entre le langage et la vision reste relativement peu explorée malgré un intérêt grandi...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voi...
Grounding natural language onto real-world perception is a fundamental challenge to empower various ...
In recent years, deep learning methods allowed the creation of neural models that are able to proces...
Alors que nous nous représentons le monde au travers de nos sens, de notre langage et de nos interac...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
Most human language understanding is grounded in perception. There is thus growing interest in combi...
Code is available at http://ibm.biz/VisualHintsWe present VISUALHINTS, a novel environment for multi...
Code is available at http://ibm.biz/VisualHintsWe present VISUALHINTS, a novel environment for multi...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Multimodal conversation modeling is an important and challenging problem when building conversationa...
The world around us involves multiple modalities -- we see objects, feel texture, hear sounds, smell...
L'interaction entre le langage et la vision reste relativement peu explorée malgré un intérêt grandi...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voi...
Grounding natural language onto real-world perception is a fundamental challenge to empower various ...
In recent years, deep learning methods allowed the creation of neural models that are able to proces...