The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization.Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this perfo...
This dissertation targets the recognition of human actions in realistic video data, such as movies. ...
International audienceIn this paper, we propose a new framework for action localization that tracks ...
International audienceObject detectors are typically trained on a large set of still images annotate...
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis ad...
International audienceWhile most existing approaches for detection in videos focus on objects or hum...
International audienceCurrent state-of-the-art approaches for spatio-temporal action detection rely ...
Human behavior understanding is a fundamental problem of computer vision. It is an important compone...
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingl...
With the rapid growth of digital video content, automaticvideo understanding has become an increasin...
We address the problem of action detection in videos. Driven by the latest progress in object detect...
This paper contributes to automatic classification and localization of human actions in video. Where...
We address the problem of action detection in videos. Driven by the latest progress in object detect...
This paper strives for spatio-temporal localization of human actions in videos. In the literature, t...
In this work, we propose an approach to the spatiotemporal localisation (detection) and classificati...
This thesis investigates the role of objects for the spatio-temporal recognition of activities in vi...
This dissertation targets the recognition of human actions in realistic video data, such as movies. ...
International audienceIn this paper, we propose a new framework for action localization that tracks ...
International audienceObject detectors are typically trained on a large set of still images annotate...
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis ad...
International audienceWhile most existing approaches for detection in videos focus on objects or hum...
International audienceCurrent state-of-the-art approaches for spatio-temporal action detection rely ...
Human behavior understanding is a fundamental problem of computer vision. It is an important compone...
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingl...
With the rapid growth of digital video content, automaticvideo understanding has become an increasin...
We address the problem of action detection in videos. Driven by the latest progress in object detect...
This paper contributes to automatic classification and localization of human actions in video. Where...
We address the problem of action detection in videos. Driven by the latest progress in object detect...
This paper strives for spatio-temporal localization of human actions in videos. In the literature, t...
In this work, we propose an approach to the spatiotemporal localisation (detection) and classificati...
This thesis investigates the role of objects for the spatio-temporal recognition of activities in vi...
This dissertation targets the recognition of human actions in realistic video data, such as movies. ...
International audienceIn this paper, we propose a new framework for action localization that tracks ...
International audienceObject detectors are typically trained on a large set of still images annotate...