We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4 % to 5 % in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61 % on STL-10 dataset. 1 Motivatio
A video saliency detection algorithm based on feature learning, called ROCT, is proposed in this wor...
Recognizing actions is one of the important challenges in computer vision with respect to video data...
Local spatio-temporal salient features are used for a sparse and compact representation of video con...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
Abstract — Since visual attention-based computer vision appli-cations have gained popularity, ever m...
In this paper, we introduce a novel technique for image matching and feature-based tracking. The tec...
In this paper, we propose an approach to learn hierarchical features for visual object tracking. Fir...
Abstract—At the core of vision research is the notion of perceptual invariance. The question of how ...
Current state-of-the art object detection and recognition algorithms mainly use supervised training,...
Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse inte...
ABSTRACT : Human action recognition is still a challenging problem and researchers are focusing to ...
Visual representation is crucial for visual tracking method׳s performances. Conventionally, visual r...
Several spatiotemporal feature point detectors have been recently used in video analysis for action ...
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]...
Human action recognition is still a challenging problem and researchers are focusing to investigate ...
A video saliency detection algorithm based on feature learning, called ROCT, is proposed in this wor...
Recognizing actions is one of the important challenges in computer vision with respect to video data...
Local spatio-temporal salient features are used for a sparse and compact representation of video con...
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit i...
Abstract — Since visual attention-based computer vision appli-cations have gained popularity, ever m...
In this paper, we introduce a novel technique for image matching and feature-based tracking. The tec...
In this paper, we propose an approach to learn hierarchical features for visual object tracking. Fir...
Abstract—At the core of vision research is the notion of perceptual invariance. The question of how ...
Current state-of-the art object detection and recognition algorithms mainly use supervised training,...
Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse inte...
ABSTRACT : Human action recognition is still a challenging problem and researchers are focusing to ...
Visual representation is crucial for visual tracking method׳s performances. Conventionally, visual r...
Several spatiotemporal feature point detectors have been recently used in video analysis for action ...
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]...
Human action recognition is still a challenging problem and researchers are focusing to investigate ...
A video saliency detection algorithm based on feature learning, called ROCT, is proposed in this wor...
Recognizing actions is one of the important challenges in computer vision with respect to video data...
Local spatio-temporal salient features are used for a sparse and compact representation of video con...