Identifying relations between objects is central to understanding the scene. While several works have been proposed for relation modeling in the image domain, there have been many constraints in the video domain due to challenging dynamics of spatio-temporal interactions (e.g., Between which objects are there an interaction? When do relations occur and end?). To date, two representative methods have been proposed to tackle Video Visual Relation Detection (VidVRD): segment-based and window-based. We first point out the limitations these two methods have and propose Temporal Span Proposal Network (TSPN), a novel method with two advantages in terms of efficiency and effectiveness. 1) TSPN tells what to look: it sparsifies relation search space...
This paper shows how a standard convolutional neural network (CNN) without recurrent connections is ...
In this paper, we present a novel method to explore semantically meaningful visual information and i...
Visual relations form the basis of understanding our compositional world, as relationships between v...
This thesis addresses two visual understanding tasks: visual relationship detection (VRD) and video ...
Visual relation detection task is the bridge between semantic text and image information, and it can...
The task of text-video retrieval aims to understand the correspondence between language and vision, ...
The aim of visual relation detection is to provide a comprehensive understanding of an image by desc...
The aim of visual relation detection is to provide a comprehensive understanding of an image by desc...
Video question answering is the task of automatically answering questions about videos. Apart from d...
Action detection is an essential and challenging task, especially for densely labelled datasets of u...
Scene graph construction / visual relationship detection from an image aims to give a precise struct...
Oral PresentationInternational audienceAction detection is an essential and challenging task, especi...
A defining characteristic of natural vision is its ability to withstand a variety of input alteratio...
The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and ...
The context of this work is to characterize the content and the structure of audiovisual documents b...
This paper shows how a standard convolutional neural network (CNN) without recurrent connections is ...
In this paper, we present a novel method to explore semantically meaningful visual information and i...
Visual relations form the basis of understanding our compositional world, as relationships between v...
This thesis addresses two visual understanding tasks: visual relationship detection (VRD) and video ...
Visual relation detection task is the bridge between semantic text and image information, and it can...
The task of text-video retrieval aims to understand the correspondence between language and vision, ...
The aim of visual relation detection is to provide a comprehensive understanding of an image by desc...
The aim of visual relation detection is to provide a comprehensive understanding of an image by desc...
Video question answering is the task of automatically answering questions about videos. Apart from d...
Action detection is an essential and challenging task, especially for densely labelled datasets of u...
Scene graph construction / visual relationship detection from an image aims to give a precise struct...
Oral PresentationInternational audienceAction detection is an essential and challenging task, especi...
A defining characteristic of natural vision is its ability to withstand a variety of input alteratio...
The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and ...
The context of this work is to characterize the content and the structure of audiovisual documents b...
This paper shows how a standard convolutional neural network (CNN) without recurrent connections is ...
In this paper, we present a novel method to explore semantically meaningful visual information and i...
Visual relations form the basis of understanding our compositional world, as relationships between v...