Current state-of-the art object detection and recognition algorithms mainly use supervised training, and most benchmark datasets contain only static images. In this work we study feature learning in the context of temporally coherent video data. We focus on training convolutional features on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a pooling auto-encoder model regularized by slowness and sparsity. First, we confirm that fully connected networks mainly learn features stable under translation. Insipred by this observation, we proceed to train convolutional slow features which reveal richer invariants that are learned from natural ...
We present a novel hierarchical, distributed model for unsupervised learning of invariant spatio-tem...
We present a novel hierarchical and distributed model for learning invariant spatio-temporal feature...
The object detection problem is composed of two main tasks, object localization and object classifi...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
We present in this paper a novel learning-based approach for video sequence classifi-cation. Contrar...
International audienceWe present in this paper a novel learning-based approach for video sequence cl...
With the development of deep neural networks, many object detection frameworks have shown great succ...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Most video based action recognition approaches create the video-level representation by temporally p...
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
As the success of deep models has led to their deployment in all areas of computer vision, it is inc...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
Common video representations often deploy an average or maximum pooling of pre-extracted frame featu...
Abstract—At the core of vision research is the notion of perceptual invariance. The question of how ...
We present a novel hierarchical, distributed model for unsupervised learning of invariant spatio-tem...
We present a novel hierarchical and distributed model for learning invariant spatio-temporal feature...
The object detection problem is composed of two main tasks, object localization and object classifi...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
We present in this paper a novel learning-based approach for video sequence classifi-cation. Contrar...
International audienceWe present in this paper a novel learning-based approach for video sequence cl...
With the development of deep neural networks, many object detection frameworks have shown great succ...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Most video based action recognition approaches create the video-level representation by temporally p...
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
As the success of deep models has led to their deployment in all areas of computer vision, it is inc...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
Common video representations often deploy an average or maximum pooling of pre-extracted frame featu...
Abstract—At the core of vision research is the notion of perceptual invariance. The question of how ...
We present a novel hierarchical, distributed model for unsupervised learning of invariant spatio-tem...
We present a novel hierarchical and distributed model for learning invariant spatio-temporal feature...
The object detection problem is composed of two main tasks, object localization and object classifi...