Temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grai...
Most recent approaches for action recognition from video leverage deep architectures to encode the v...
Human action recognition in videos is an important task with a broad range of applications. In this ...
Long-Term activities involve humans performing complex, minutes-long actions. Differently than in tr...
In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
In this paper we address the problem of continuous fine-grained action segmentation, in which multip...
In this paper we address the problem of continuous fine-grained action segmentation, in which multip...
We introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP)...
In this dissertation, I present my work towards exploring temporal information for better video unde...
In this dissertation, I present my work towards exploring temporal information for better video unde...
In this paper, we newly introduce the concept of temporal attention filters, and describe how they c...
Most recent approaches for action recognition from video leverage deep architectures to encode the v...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
Human action recognition in videos is an important task with a broad range of applications. In this ...
Most recent approaches for action recognition from video leverage deep architectures to encode the v...
Human action recognition in videos is an important task with a broad range of applications. In this ...
Long-Term activities involve humans performing complex, minutes-long actions. Differently than in tr...
In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal...
This thesis focuses on video understanding for human action and interaction recognition. We start by...
In this paper we address the problem of continuous fine-grained action segmentation, in which multip...
In this paper we address the problem of continuous fine-grained action segmentation, in which multip...
We introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP)...
In this dissertation, I present my work towards exploring temporal information for better video unde...
In this dissertation, I present my work towards exploring temporal information for better video unde...
In this paper, we newly introduce the concept of temporal attention filters, and describe how they c...
Most recent approaches for action recognition from video leverage deep architectures to encode the v...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
We generate massive amounts of video data every day. While most real-world videos are long and untri...
Human action recognition in videos is an important task with a broad range of applications. In this ...
Most recent approaches for action recognition from video leverage deep architectures to encode the v...
Human action recognition in videos is an important task with a broad range of applications. In this ...
Long-Term activities involve humans performing complex, minutes-long actions. Differently than in tr...