Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling of long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a self-attention mechanism to adaptively pool the classification scores of different video segments. Specifically, using frame-level features, DATP regresses importance of different temporal segments, and generates weights for them. Remarkably, DATP is trained using only the video-level label. There is no need of additional supervision except vid...
International audienceThe temporal component of videos provides an important clue for activity recog...
Deep convolutional neural networks have lately dominated scene understanding tasks, particularly tho...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
Technological innovation in the field of video action recognition drives the development of video-ba...
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a sin...
The amount of video content generated increases daily, three hundred hours of video content is uploa...
Recent advances in deep neural networks have been successfully demonstrated with fairly good accurac...
The tremendous growth in video data, both on the internet and in real life, has encouraged the devel...
We propose a function-based temporal pooling method that captures the latent structure of the video ...
We propose a function-based temporal pooling method that captures the latent structure of the video ...
Classification of human actions from real-world video data is one of the most important topics in co...
Two-stream human recognition achieved great success in the development of video action recognition u...
Part 2: Deep LearningInternational audienceResearch in human action recognition has accelerated sign...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Research in human action recognition has accelerated significantly since the introduction of powerfu...
International audienceThe temporal component of videos provides an important clue for activity recog...
Deep convolutional neural networks have lately dominated scene understanding tasks, particularly tho...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
Technological innovation in the field of video action recognition drives the development of video-ba...
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a sin...
The amount of video content generated increases daily, three hundred hours of video content is uploa...
Recent advances in deep neural networks have been successfully demonstrated with fairly good accurac...
The tremendous growth in video data, both on the internet and in real life, has encouraged the devel...
We propose a function-based temporal pooling method that captures the latent structure of the video ...
We propose a function-based temporal pooling method that captures the latent structure of the video ...
Classification of human actions from real-world video data is one of the most important topics in co...
Two-stream human recognition achieved great success in the development of video action recognition u...
Part 2: Deep LearningInternational audienceResearch in human action recognition has accelerated sign...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
Research in human action recognition has accelerated significantly since the introduction of powerfu...
International audienceThe temporal component of videos provides an important clue for activity recog...
Deep convolutional neural networks have lately dominated scene understanding tasks, particularly tho...
We generate massive amounts of video data every day. While most real-world videos are long and untri...