The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving finegrained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures ...
Action recognition requires the accurate analysis of action elements in the form of a video clip and...
Fine-grained action recognition involves comparison of similar actions of variable-length size consi...
We present a biologically-motivated system for the recognition of actions from video sequences. The ...
In this paper, we propose an approach to classify action sequences. We observe that in action sequen...
Extracting discriminative and robust features from video sequences is the first and most critical st...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
We present a biologically-motivated system for the recognition of actions from video sequences. The ...
This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal ...
This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal ...
<div><p>It is well known that the visual cortex efficiently processes high-dimensional spatial infor...
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
Action recognition requires the accurate analysis of action elements in the form of a video clip and...
Fine-grained action recognition involves comparison of similar actions of variable-length size consi...
We present a biologically-motivated system for the recognition of actions from video sequences. The ...
In this paper, we propose an approach to classify action sequences. We observe that in action sequen...
Extracting discriminative and robust features from video sequences is the first and most critical st...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
Currently, many action recognition methods mostly consider the information from spatial streams. We ...
We present a biologically-motivated system for the recognition of actions from video sequences. The ...
This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal ...
This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal ...
<div><p>It is well known that the visual cortex efficiently processes high-dimensional spatial infor...
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]...
Automated analysis of videos for content understanding is one of the most challenging and well resea...
Action recognition requires the accurate analysis of action elements in the form of a video clip and...
Fine-grained action recognition involves comparison of similar actions of variable-length size consi...
We present a biologically-motivated system for the recognition of actions from video sequences. The ...