Neural networks have been shown effective at learning rich low-dimensional representations of high-dimensional data such as images and text. There has also been many recent works using neural networks to learn a common embedding between data of different modes, specifically between images and textual descriptions, a task commonly referred to as learning visual-semantic embeddings. This is typically achieved using a separate encoder for images and text and a contrastive loss. Inspired by recent works in relational reasoning and graph neural networks, this work studies the effects of using a relational inductive bias on the quality of learned visual-semantic embeddings. Training and evaluation is done using caption-to-image and image-to-capti...
Image captioning is shown to be able to achieve a better performance by using scene graphs to repres...
Humans often perceive the physical world as sets of relations between objects, whatever nature (visu...
Current deep learning approaches have shown good in-distribution generalization performance, but str...
Neural networks have been shown effective at learning rich low-dimensional representations of high-d...
The increasing interest in social networks, smart cities, and Industry 4.0 is encouraging the develo...
Relational data is ubiquitous in modern-day computing, and drives several key applications across mu...
Recent works in deep-learning research highlighted remarkable relational reasoning capabilities of s...
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptio...
Visual Relationship Detection (VRD) is a relatively young research area, where the goal is to develo...
Due to the powerful ability to learn low-level and high-level general visual features, deep neural n...
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and t...
While modern deep neural architectures generalise well when test data is sampled from the same distr...
International audienceA thorough comprehension of image content demands a complex grasp of the inter...
Computer vision is moving from predicting discrete, categorical labels to generating rich descriptio...
Language and vision are the two most essential parts of human intelligence for interpreting the real...
Image captioning is shown to be able to achieve a better performance by using scene graphs to repres...
Humans often perceive the physical world as sets of relations between objects, whatever nature (visu...
Current deep learning approaches have shown good in-distribution generalization performance, but str...
Neural networks have been shown effective at learning rich low-dimensional representations of high-d...
The increasing interest in social networks, smart cities, and Industry 4.0 is encouraging the develo...
Relational data is ubiquitous in modern-day computing, and drives several key applications across mu...
Recent works in deep-learning research highlighted remarkable relational reasoning capabilities of s...
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptio...
Visual Relationship Detection (VRD) is a relatively young research area, where the goal is to develo...
Due to the powerful ability to learn low-level and high-level general visual features, deep neural n...
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and t...
While modern deep neural architectures generalise well when test data is sampled from the same distr...
International audienceA thorough comprehension of image content demands a complex grasp of the inter...
Computer vision is moving from predicting discrete, categorical labels to generating rich descriptio...
Language and vision are the two most essential parts of human intelligence for interpreting the real...
Image captioning is shown to be able to achieve a better performance by using scene graphs to repres...
Humans often perceive the physical world as sets of relations between objects, whatever nature (visu...
Current deep learning approaches have shown good in-distribution generalization performance, but str...